In the following, we assume, the reader has already looked at the Getting started page.
This document attempts to be as self-contained as possible. Blang is designed to be usable with very minimal past programming exposure. However, to fully master the advanced features of Blang, familiarity with a modern multi-paradigm statically typed language is recommended.
Additionally, for readers familiar with Java and using the Blang IDE, you can right click on any Blang file and select
Open generated file
to see the Java code generated by Blang.
The information stored by programs into a computer's memory is highly structured into chunks. Each chunk is called an object. Some of the objects have shared properties. A property that allows us to group objects is called a type.
Concretely, types in Blang are equivalent to Java types (a terminology that encompasses Java classes, interfaces, primitives, enumerations and annotation interfaces). This means any Java type can be imported and used in Blang, and any model defined in Blang can be imported and used in Java with no extra work needed.
Single line comments use the syntax // some comment spanning the rest of the line
.
Multi-line comments use /* many lines can go here */
.
In the following, we use comments to give contextual explanation on syntax examples.
The syntax for Blang models is as follows:
In the next few sections we describe each element in the above example in detail.
Packages in Blang work as in Java. Their syntax is similar too except that the semicolumn can be skipped in Blang.
Similarly, import
statements also support the Java counterparts, for example:
import static
is used to import a function (the function called col
in
the above example), while standard imports are for types. The name "static" comes from Java where "static method" means
a "regular" function, i.e. a procedure that exists outside of the context of an object.
Blang additionally support extension import directives of the form
Extensions methods are described in the XExpression section below.
Blang automatically imports:
all the types in the following packages:
blang.core
,
blang.distributions
,
blang.io
,
blang.types
,
blang.mcmc
,
java.util
,
xlinear
;
all the static function in the following files:
xlinear.MatrixOperations
,
bayonet.math.SpecialFunctions
,
org.apache.commons.math3.util.CombinatoricsUtils
,
blang.types.StaticUtils
;
as static extensions all the static function in the following files:
xlinear.MatrixExtensions
,
blang.types.ExtensionUtils
.
Variables are declared using the following syntax:
Each variable is declared as either random
or param
(parameter).
This controls how the model can be invoked by other models.
Parameters can become random when the model is used in the context of a larger model.
In the above example, Type1
, Type2
, ... can syntactically
be of any type. However at runtime some requirement
must be met by these types.
Initialization blocks are XExpressions, which admit a rich and concise, Turing-complete syntax,
described in more details below. For multi-line expressions, surround the code with curly braces,
{
, }
. For one-line expressions, braces can be skipped.
Initialization blocks are used in the main model to provide alternatives to command line arguments. In other words, Blang calls them when no command line arguments are provided for the variable associated with the init block. If an argument is provided for the variable, the init block will be ignored.
Initialization blocks can use the values of the other variables listed previously.
Initialization blocks are not currently used in the context of the model being used by a parent model.
Laws (prior or conditional distributions) can be either defined by invoking another Blang model (composite laws) or by defining one or several factors.
During initialization, composite laws are recursively unrolled until a list of all factors is created. The final model is the product of all these factors (more precisely, the sum of all the log-factors).
Composite laws are declared as follows:
DistributionName refers to another Blang model, properly imported unless it is in an automatically imported package.
Variables variableExpression1
, variableExpression2
, ... are matched from left to right in the same order as
the random
variables are declared in the imported Blang model. Each can be an arbitrary XExpression, which is executed once at initialization time.
One important example is an XExpression that just returns a variable or parameter declared previously via the keyword random
or param
. Another important example is obtaining a variable from a collection of variable, e.g.
someList.get(anIndex)
.
Each conditioning expression in conditioning1
, conditioning2
, ... can take one of two
possible forms. First, it can be simply one of the names declared via the keyword random
or param
.
Second, it can be a declaration of the form TypeName conditioningName = XExpression
, where the XExpression
is executed once at initialization time. An example of the latter from an HMM: IntVar prevState = states.get(time - 1)
.
Each argument in expression1
, expression2
, ... is an XExpression matched from left to right in the same
order as the param
variables are declared in the imported Blang model. In contrast to the other XExpressions discussed above,
the XExpression in the arguments are recomputed each time the factors are evaluated during inference. This correspond to the natural
behaviour one would expect from the mathematical notation. For example, if we write x | y ~ Normal(f(y), 1.0)
, we expect the
expression for the mean, f(y)
to be recomputed at each evaluation.
Atomic laws are declared as follows:
The block { ... }
contains an XExpression returning the probability for this factor as a double in log scale.
Each argument in expression1
, expression2
, ... can take one of two
possible forms. First, it can be simply one of the names declared via the keyword random
or param
.
Second, it can be a declaration of the form TypeName conditioningName = XExpression
, where the XExpression
is executed once at initialization time.
Indicator factors are used to mark the support of a distribution:
The block { ... }
contains an XExpression returning a boolean value, whether the current configuration is in the support or not.
Each argument in expression1
, expression2
, ... are defined as in numerical factors.
In certain cases, it is useful to mark a variable as having special constraints, to disable the standard sampling machinery and use specialized samplers instead. For example, this should be used when a simplex is used in a model to avoid attempting naive sampling of the individual entries.
The syntax for loop appearing at the top level of laws block should follow
The range
is an XExpression returning an instance of Iterable
. Also the range
should not be random (i.e. should not change during sampling). However, sampling of infinite
dimensional objects can be handled by creating dedicated types. Indeed, loops inside XExpressions
(described later in this document) are much more general.
Any loops in Blang can be nested.
The optional generate block, an XExpression, provides an alternative but equivalent description of the same model. It is syntactically optional, but is required in certain but not all runtime contexts.
If all the laws in the model are composite, and the components already provide generate blocks, it is not necessary to provide a generate block.
Conversely, if all the laws are atomic, providing a generate block is necessary for several runtime tasks, including:
Sequential change of measure methods, which use exact samples from the prior as the initial probability measure in a sequence of measures ending in the target posterior distribution.
Correctness tests, which rely on testing the equivalence between the laws
and generate
implementations.
The argument name nameOfRandomObject
in generate(nameOfRandomObject)
provides an arbitrary name to the
input random number generator. No type is provided as the argument is always a subtype of java.util.Random
. In practice,
the runtime uses the subtype bayonet.distributions.Random
which provides a better algorithm (a Mersenne twister) as well
as cross compatibility between java.util.Random
and Apache common's RandomGenerator
objects and more.
If the model has exactly one random
variable of type IntVar
or RealVar
then the
generate block should return an integer or double respectively, corresponding to the new realization. Otherwise, the generate block
should modify the random
variable(s) should be modified in place.
The syntax for XExpressions is provided by the Xtext language engineering framework.
XExpressions are also used by Xtend, an expressive language built on top of Java providing "powerful macros, lambdas, operator overloading and many more modern language features". We use Xtend to write some parts of the runtime machinery.
We review the main aspects of XExpressions relevant for writting Blang models here for completeness, following the structure of the official Xtend documentation.
Types can be categorized as follows:
primitives, which are low-level building blocks. Those relevant here are
boolean
, int
and double
.
They work as in Java;
object references; which can be thought of as an annotated address to a memory location (possibly null
);
array references. This last category is rarely directly used in Blang.
Instead, use higher level constructs provided by the Java SKD, such
as objects of type ArrayList
, String
.
The following expressions create constants of various types:
boolean
: true
, false
int
: e.g. 42
, 12_000
double
: make sure to add a decimal suffix, 1.0
, or the scientific notation 1.3e2
type literals, e.g. String
, which is equivalent to Java's String.class
List
: e.g. #[true, false]
Set
: e.g. #{"red", "blue", "green"}
Pair
, with arbitrary key type and value type: e.g. "likelihood" -> -123.43
Map
: e.g. #{"key" -> 1 ,"key2" -> 2}
Some examples:
In the first example, val
encodes that the variable cannot be changed (in the same sense as Java's final keyword for variables), while the other example
use var
encoding the fact the variable can be changed afterwards.
The meaning of immutability is simple to understand in the case of primitive, but it should be interpreted carefully in the context of references. In the latter, it means that the reference will always point to the same object in the heap, however the internal state of that object might change over its life time.
In the third line of the example, the type is inferred automatically (here as List
). Such automatic type
inference is often, but
not always possible. We recommend avoiding this construct however to maintain readability.
Conditional expressions have the form
Optionally, they can have an else clause. Also, the pair of if and else is an expression (i.e. returns a value)
If else is not included, else null
is used implicitly to maintain an expression interpretation.
Instead of chaining several if and else, use a switch, which is significantly more functional than Java's, as it relies on
call .equals
by default, and does not require calling break
after each clause:
Several loop variants are allowed:
High-level for loop, for (IteratorType iteratorName : range) { ... }
, as in laws blocks but without restrictions on the range
being fixed during sampling, e.g for (String s : #["a", "b"]) { println(s) }
. The type of the iterator
can be skipped (not recommended for readability).
Basic for loop, for (var IteratorType iteratorName = init; condition; update) { ... }
, e.g.
for (var int i = 0; i <= 10; i++) { ... }
.
While loops, while (condition) { ... }
.
Do-while, where the body is executed at least once, do { ... } while (condition)
.
Functions are called as in most languages, i.e. nameOfFunction(expression1, expression2, ...)
, where each
element in expression1, expression2, ...
are themselves XExpressions. These expressions are evaluated
first, then the results of these evaluations are passed in to the function ("eager evaluation", as in Java for example).
The only exception is the composite laws listed in laws { .. }
, as described above, in which case evaluation
of the argument is delayed at initialization and instead repeated each time the density is evaluated during MCMC sampling
(a form of "lazy evaluation" in this unique special case).
In all cases, the actual function call only involves copying a constant size register so that function calls are always very cheap. For primitives, the value of the primitive is copied (and hence the original primitive can never suffer side effects from the call). For references, the memory address in the reference is copied (and hence the original reference cannot be changed, although the object it points to might have its state changed by the function call).
To create your own function, create a separate Java or Xtend file. In Java, use:
In Xtend:
Then, add import static my.pack.MyFunction.*
to your Blang file. You will now be able to call
myFunction(arg1, arg2)
.
Objects are created by writing new NameOfClass(argument1, ...)
. This can be
shortened to new NameOfClass
if there are no arguments.
In some libraries, the call to new is wrapped inside a static function. In this case, just call the function to instantiate the object.
Classes have instance variables or field (variables guaranteed to be available for a given type), as well as (instance) methods (functions associated with the object having access to the object's instance variables). Collectively, fields and methods are called features.
Features are accessed using the "dot" notation: object.variable
and object.method(...)
. When a
method has no argument, the call can be shortened to object.method
.
The ability to call a feature is subject to Java visibility constraints. In short, only public features can be called from outside file declaring a class.
The special variable it
allows to provide a default receiver for feature calls. For example:
A "lambda expression" is an unfortunate name for a simple concept: a succinct way to write function without having to give it a name. This makes it easy to call functions which take functions as argument (e.g. to apply the function to each item in a list, etc). Since they are so useful, many syntactic shortcut are available.
Explicit syntax for lambda expressions is:
For example, to capitalize words in a list:
When there is a single input argument, you can skip declaring the argument, and instead the argument will be
assigned to it
(describe in the previous section). This allows us to write for example:
Finally, when the last argument of a function is a function, you can simply put the lambda after the parentheses of the function call. For example:
Lambda expression can also access final variables (i.e. marked by val
) that are in the scope.
Lambda expressions can be automatically cast to interfaces having a single declared method.
Type casts work as in Java, but with a more readable syntax: aDoubleVariable as int
instead of (int) aDoubleVariable
.
Boxing refers to wrapping a primitive such as int
or double
into an object
such as Integer
or Double
. Deboxing is the reverse process.
As in Java, the conversion between the two (boxing/deboxing) is
automatic in the vast majority of the cases.
Blang adds boding/deboxing to and from IntVar
and RealVar
.
The scope of a variable is the subset of the code in which it can be accessed. Scoping in Blang generally works as in most programming language: to find the scope of a variable, identify the parent set of braces, these determine the region of the code where the variable can be accessed. If one variable reference is in the scope of several variables declared with the same name, the innermost set of braces has priority.
The only exception are the arguments of the atomic and composite laws, which require explicit identification of the variables
to include in the scope. These variables to be included should be identified at the right of the |
symbol.
We make this modification because these scoping dependencies drive the inference of the sparsity patterns in the graphical model.
Operator overloading is permitted. One important case to be aware of is ==
which is overloaded
to .equals(..)
. Use ===
for the low-level equality operator that checks if the
two sides are identical (with the exception of Double.NaN
, Not a Number, which following
IEEE convention is never equal to anything).
When in the Blang IDE, command click on an operator to reveal its definition.
Some useful operators automatically imported:
object => lambdaExpression
: calls the lambda expression with the input given by object,
e.g. new ArrayList => [add("to be added in list")]
range operators, for example 0 .. 10
, 0 ..< 11
, -1 >.. 10
; all these
examples return the integers \( 0, 1, 2, ..., 10 \).
See the Xtend documentation if you want to overload operators in custom types.
Extension methods provide a kind of lightweight trait, i.e. adding methods to existing classes on demand.
This is done by adding an extension import:
You can then write arg1.myfunction(arg2, ...)
instead of myfunction(arg1, arg2, ...)
.
Types can be parameterized, for example to use Java's List
type, it is preferable to specify the
type that will stored in the list. For example to declare that strings will be stored, use List<String>
as in Java or Xtend.
Models can use variables with type parameters but models themselves cannot have type parameters at the moment.
Throw exceptions to signal abnormal behaviour and to terminate the Blang runtime with an informative message:
To signal that the current factor has invalid parameters: if possible just return the value -INFINITY, or if it is not
easy for a certain code structure, use instead blang.types.StaticUtils.invalidParameter
, which will
be caught and interpreted as the factor having zero probability.
In contrast to Java, exception are never required to be declared or caught. If they need to be caught, the syntax is:
There are a few other aspects of XExpressions that we haven't covered here:
the synchronized
keyword and a
rich parallelization library;
optional dispatch method, allowing to mix and match static and runtime method polymorphism;
active annotation, which along with the reflection API, allows powerful meta-programming;
built-in string templates.
Detailed description of these features can be found in the Xtend documentation.
As customary, Java types should be created in separate files and imported into Blang models as needed. The separate files can be written either in Java or in Xtend.