All you need to get started is available in this zip file:
Really, "installing" amounts to unzipping and copying the contents. The folder contains both the IDE, a template for your own projects, and some command line tools.
The first time you try to launch BlangIDE, depending on the version of Mac OS X and/or security settings, you may get a message saying the "app is not registered with Apple by an identified developer". To work around this, follow these instructions (from Apple) the first time you open the BlangIDE (then Mac OS will remember your decision for subsequent launches):
In the Finder, locate BlangIDE (don't use Launchpad to do this).
Control-click the app icon, then choose Open from the shortcut menu.
If this does not work, an alternative is also described in the same Apple help page.
While the download proceeds, here is short tutorial on Blang.
A Blang model
specifies a joint probability distribution over a collection of random variables.
Here is an example, based on a very simple model for the famous Doomsday argument:
Doomsday
is a just a name we give to this model. As a convention, we encourage users to capitalize model names
(Blang is case-sensitive).
Variables need to specify their type, e.g.: random RealVar z
is of type RealVar
and
we give it the name z
. Some of the other important built-in types
are IntVar
and DenseMatrix
.
random
and param
are Blang keywords. We will get back to the difference between the two.
As a convention, types are capitalized, variable names are not.
The section laws { ... }
defines distribution and conditional distributions on the random variables.
The syntax is the same as the notation used in probability theory. For example, y | z ~ ContinuousUniform(0.0, z)
means that the conditional distribution of y
given z
is uniformly distributed between zero and z
.
Each Blang model is turned into a program supporting various inference methods. To demonstrate that, let's run the above example.
Setup one of these two methods: running Blang with the Web App, or with the Blang IDE.
Once you follow the above steps, you will get a message about missing arguments. These arguments essentially control the data the model should condition on, as well as the algorithm used to approximate the conditional expectation (the 'inference engine'). The arguments are automatically discovered with the minimal helps of some annotations. We will cover that later. For now, let's provide the minimal set:
This specifies values for rate
and y
, and mark z
as
missing (unobserved, and hence sampled over). You will see the following output
The most important piece of information here is the outputFolder
.
Look into that directory. You will find in samples/z.csv
the samples in a tidy format,
ready to be used by any sane data analytic tool.
You can also view the list of all arguments by adding the argument --help
.
Let's look at how ContinuousUniform
is implemented in the SDK. Since the SDK is written
in Blang, you will proceed in the exact same way to create yours. Control click on ContinuousUniform
in Blang IDE, you will be taken to its definition:
The syntax should be self-explanatory:
the laws
block defines the density as the sum of the log density factors
logf
listed (indicator
is just a shortcut for 0-1 factors),
the optional generate
block specifies a forward sampling procedure.
The body of logf
, indicator
, and generate
admit a rich and concise, Turing-complete syntax. We will refer to such block as an
XExpression. We will talk more about it later on.
Another important method for creating models is by composing and transforming one or several
other distribution. Look at the definition of Exponential
for example:
For both models constructed using an explicit density (like ContinuousUniform
), and
those constructed by composition (like Exponential
), we invoke them in the same way:
where the random variables are listed in the same order as the variables marked by the keyword
random
appear in the invoked model definition, and the parameters are listed in the same
order a the variables marked by the keyword param
.
To create your own distribution, simply create a new .bl
file in your project.
When you want to use it in another file, don't forget to add an import declaration after the
package declaration (only certain packages are automatically imported, such as blang.distributions
).
A plate is simply an element of a graphical model which is repeated many times. Let's look for example at a simple
hierarchical modelling problem: suppose you have a data file failure_counts.csv
of this form
Each row contains a Launch Vehicle (LV) type, and the number of successful launches for that type of rocket, as well as the total number of launches. We would like to get a posterior distribution over the failure probability of each LV type via a hierarchical model that borrows strength across types. Here is a Blang model that does that:
The for loop here uses plates and plated objects to set up a large graphical models. More generally, the syntax is
for (IteratorType iteratorName : collection) { ... }
, where collection
is any instance of the
Iterable interface.
To run the HierarchicalModel example, use the following options:
The first option correspond to the line param GlobalDataSource data
in the Blang model. This provides a
default csv file to look for data for all the Plate
and Plated
variables (a "Plated"
type is just a variable that sits within a plate, i.e. that is repeated).
By default, all the Plate
and Plated
will look for a column with a name corresponding to the
one given in the Blang file. We only need to override this default for the rocketTypes
plate, by setting
the command line argument --model.rocketTypes.name LV.Type
.
Arbitrary Java or Xtend types are inter-operable with Blang. When you want to use them as latent variables, some additional work is needed. However Blang provides utilities to assist you in this process, in particular for testing correctness.
As a first example, let's look at how sampling is implemented for Simplex
variables in the SDK (i.e.
vectors where the entries are constrained to sum to one).
Sampling this variable requires special attention because of the sum to one constraint.
After implementing the class DenseSimplex
(just a plain Java class, based on a n-by-1 matrix), we add an
annotation to point to the sampler that we will design: @Samplers(SimplexSampler)
.
Here is the sampler:
The actual work is done in the execute
method.
The SimplexWritableVariable
is
just a utility which, when entry (dimension) index sampledDim
is altered
in the simplex,
the following index (modulo the number of entries) is decrease by the same amount.
After picking an index, we use a slice sampler to perform the actual sampling.
The instantiation of samplers is automated. The instance variables annotated with
@SampledVariable
and @ConnectedFactor
guide this process.
@SampledVariable
is filled with the variable to be sampled.
Then the factors connected this variables need to be all assigned to @ConnectedFactor
for the sampler to be included in the sampling process.
LogScaleFactor
is the interface for the factors created by
logf
and indicator
blocks.
Constrained
is a factor used to mark variables that require special samplers.
For example, the Dirichlet distribution contains the line realization is Constrained
to ensure standard samplers for real variables are avoided in the context of a simplex.
The optional method setup
performs additional initialization checks if needed.
It should return a boolean indicating whether this sampler should be used or not in the current context.
Additional tutorial and reference materials to go more in-depth:
Built in random variables: building blocks for Blang models.
Input and Output: how to get data into Blang for conditioning, and samples out.
Inference and runtime: how the Blang runtime system performs inference based on Blang models.
Custom types: more details on creating your own types.
Testing: tests used to check the correctness of Blang SDK as well as your distributions, samplers and types.