### Overview of the syntax reference

In the following, we assume, the reader has already looked at the Getting started page.

This document attempts to be as self-contained as possible. Blang is designed to be usable with very minimal past programming exposure. However, to fully master the advanced features of Blang, familiarity with a modern multi-paradigm statically typed language is recommended.

Additionally, for readers familiar with Java and using the Blang IDE, you can right click on any Blang file and select Open generated file to see the Java code generated by Blang.

### Types

The information stored by programs into a computer's memory is highly structured into chunks. Each chunk is called an object. Some of the objects have shared properties. A property that allows us to group objects is called a type.

Concretely, types in Blang are equivalent to Java types (a terminology that encompasses Java classes, interfaces, primitives, enumerations and annotation interfaces). This means any Java type can be imported and used in Blang, and any model defined in Blang can be imported and used in Java with no extra work needed.

Single line comments use the syntax // some comment spanning the rest of the line.

Multi-line comments use /* many lines can go here */.

In the following, we use comments to give contextual explanation on syntax examples.

### Models

The syntax for Blang models is as follows:

package my.namespace // optional // import statements model NameOfModel { // variables declarations laws { // laws declaration } generate(nameOfRandomObject) { // optional // generate block } }

In the next few sections we describe each element in the above example in detail.

Packages in Blang work as in Java. Their syntax is similar too except that the semicolumn can be skipped in Blang.

Similarly, import statements also support the Java counterparts, for example:

import org.apache.spark.sql.Dataset import static org.apache.spark.sql.functions.col

import static is used to import a function (the function called col in the above example), while standard imports are for types. The name "static" comes from Java where "static method" means a "regular" function, i.e. a procedure that exists outside of the context of an object.

Blang additionally support extension import directives of the form

import static extension my.namespace.myfunction

Extensions methods are described in the XExpression section below.

Blang automatically imports:

• all the types in the following packages: blang.core, blang.distributions, blang.io, blang.types, blang.mcmc, java.util, xlinear;

• all the static function in the following files: xlinear.MatrixOperations, bayonet.math.SpecialFunctions, org.apache.commons.math3.util.CombinatoricsUtils, blang.types.StaticUtils;

• as static extensions all the static function in the following files: xlinear.MatrixExtensions, blang.types.ExtensionUtils.

Variables are declared using the following syntax:

random Type1 name1 random Type2 name2 ?: { ... } // optional initialization block random Type3 name3 ?: someStaticallyImportedFunction(name2) // another optional init param Type4 name4 param Type5 name5 ?: { .. } // one more optional init block

Each variable is declared as either randomor param (parameter). This controls how the model can be invoked by other models. Parameters can become random when the model is used in the context of a larger model.

In the above example, Type1, Type2, ... can syntactically be of any type. However at runtime some requirement must be met by these types.

Initialization blocks are XExpressions, which admit a rich and concise, Turing-complete syntax, described in more details below. For multi-line expressions, surround the code with curly braces, {, }. For one-line expressions, braces can be skipped.

Initialization blocks are used in the main model to provide alternatives to command line arguments. In other words, Blang calls them when no command line arguments are provided for the variable associated with the init block. If an argument is provided for the variable, the init block will be ignored.

Initialization blocks can use the values of the other variables listed previously.

Initialization blocks are not currently used in the context of the model being used by a parent model.

Laws (prior or conditional distributions) can be either defined by invoking another Blang model (composite laws) or by defining one or several factors.

During initialization, composite laws are recursively unrolled until a list of all factors is created. The final model is the product of all these factors (more precisely, the sum of all the log-factors).

##### Composite laws

Composite laws are declared as follows:

variableExpression1, variableExpression2, ... | conditioning1, conditioning2, ... ~ DistributionName(expression1, expression2, ...)

DistributionName refers to another Blang model, properly imported unless it is in an automatically imported package. Variables variableExpression1, variableExpression2, ... are matched from left to right in the same order as the random variables are declared in the imported Blang model. Each can be an arbitrary XExpression, which is executed once at initialization time. One important example is an XExpression that just returns a variable or parameter declared previously via the keyword random or param. Another important example is obtaining a variable from a collection of variable, e.g. someList.get(anIndex).

Each conditioning expression in conditioning1, conditioning2, ... can take one of two possible forms. First, it can be simply one of the names declared via the keyword random or param. Second, it can be a declaration of the form TypeName conditioningName = XExpression, where the XExpression is executed once at initialization time. An example of the latter from an HMM: IntVar prevState = states.get(time - 1).

Each argument in expression1, expression2, ... is an XExpression matched from left to right in the same order as the param variables are declared in the imported Blang model. In contrast to the other XExpressions discussed above, the XExpression in the arguments are recomputed each time the factors are evaluated during inference. This correspond to the natural behaviour one would expect from the mathematical notation. For example, if we write x | y ~ Normal(f(y), 1.0), we expect the expression for the mean, f(y) to be recomputed at each evaluation.

##### Atomic laws (numerical factors)

Atomic laws are declared as follows:

logf(expression1, expression2, ...) { .. }

The block { ... } contains an XExpression returning the probability for this factor as a double in log scale.

Each argument in expression1, expression2, ... can take one of two possible forms. First, it can be simply one of the names declared via the keyword random or param. Second, it can be a declaration of the form TypeName conditioningName = XExpression, where the XExpression is executed once at initialization time.

##### Atomic laws (indicator factors)

Indicator factors are used to mark the support of a distribution:

indicator(expression1, expression2, ...) { .. }

The block { ... } contains an XExpression returning a boolean value, whether the current configuration is in the support or not.

Each argument in expression1, expression2, ... are defined as in numerical factors.

##### Atomic laws (constraint markers)

In certain cases, it is useful to mark a variable as having special constraints, to disable the standard sampling machinery and use specialized samplers instead. For example, this should be used when a simplex is used in a model to avoid attempting naive sampling of the individual entries.

variable is Constrained

##### Loops in laws block

The syntax for loop appearing at the top level of laws block should follow

for (IteratorType iteratorName : range) { ... }

The range is an XExpression returning an instance of Iterable. Also the range should not be random (i.e. should not change during sampling). However, sampling of infinite dimensional objects can be handled by creating dedicated types. Indeed, loops inside XExpressions (described later in this document) are much more general.

Any loops in Blang can be nested.

The optional generate block, an XExpression, provides an alternative but equivalent description of the same model. It is syntactically optional, but is required in certain but not all runtime contexts.

If all the laws in the model are composite, and the components already provide generate blocks, it is not necessary to provide a generate block.

Conversely, if all the laws are atomic, providing a generate block is necessary for several runtime tasks, including:

1. Sequential change of measure methods, which use exact samples from the prior as the initial probability measure in a sequence of measures ending in the target posterior distribution.

2. Correctness tests, which rely on testing the equivalence between the laws and generate implementations.

The argument name nameOfRandomObject in generate(nameOfRandomObject) provides an arbitrary name to the input random number generator. No type is provided as the argument is always a subtype of java.util.Random. In practice, the runtime uses the subtype bayonet.distributions.Random which provides a better algorithm (a Mersenne twister) as well as cross compatibility between java.util.Random and Apache common's RandomGenerator objects and more.

If the model has exactly one random variable of type IntVar or RealVar then the generate block should return an integer or double respectively, corresponding to the new realization. Otherwise, the generate block should modify the random variable(s) should be modified in place.

### XExpressions

The syntax for XExpressions is provided by the Xtext language engineering framework.

XExpressions are also used by Xtend, an expressive language built on top of Java providing "powerful macros, lambdas, operator overloading and many more modern language features". We use Xtend to write some parts of the runtime machinery.

We review the main aspects of XExpressions relevant for writting Blang models here for completeness, following the structure of the official Xtend documentation.

Types can be categorized as follows:

• primitives, which are low-level building blocks. Those relevant here are boolean, int and double. They work as in Java;

• object references; which can be thought of as an annotated address to a memory location (possibly null);

• array references. This last category is rarely directly used in Blang. Instead, use higher level constructs provided by the Java SKD, such as objects of type ArrayList, String.

The following expressions create constants of various types:

• boolean: true, false

• int: e.g. 42, 12_000

• double: make sure to add a decimal suffix, 1.0, or the scientific notation 1.3e2

• type literals, e.g. String, which is equivalent to Java's String.class

• List: e.g. #[true, false]

• Set: e.g. #{"red", "blue", "green"}

• Pair, with arbitrary key type and value type: e.g. "likelihood" -> -123.43

• Map: e.g. #{"key" -> 1 ,"key2" -> 2}

Some examples:

val int myConstantInt = 17 var String myModifiableInt var typeInferred = #[1, 2, 3]

In the first example, val encodes that the variable cannot be changed (in the same sense as Java's final keyword for variables), while the other example use var encoding the fact the variable can be changed afterwards.

The meaning of immutability is simple to understand in the case of primitive, but it should be interpreted carefully in the context of references. In the latter, it means that the reference will always point to the same object in the heap, however the internal state of that object might change over its life time.

In the third line of the example, the type is inferred automatically (here as List). Such automatic type inference is often, but not always possible. We recommend avoiding this construct however to maintain readability.

Conditional expressions have the form

if (condition) { // do something }

Optionally, they can have an else clause. Also, the pair of if and else is an expression (i.e. returns a value)

val String variable = if (condition) "firstString" else "secondString"

If else is not included, else null is used implicitly to maintain an expression interpretation.

Instead of chaining several if and else, use a switch, which is significantly more functional than Java's, as it relies on call .equals by default, and does not require calling break after each clause:

switch myString { case myString.isEmpty : "empty" case "match" : "the string is equal to 'match'" default : "This is a default case" }

Several loop variants are allowed:

• High-level for loop, for (IteratorType iteratorName : range) { ... }, as in laws blocks but without restrictions on the range being fixed during sampling, e.g for (String s : #["a", "b"]) { println(s) }. The type of the iterator can be skipped (not recommended for readability).

• Basic for loop, for (var IteratorType iteratorName = init; condition; update) { ... }, e.g. for (var int i = 0; i <= 10; i++) { ... }.

• While loops, while (condition) { ... }.

• Do-while, where the body is executed at least once, do { ... } while (condition).

Functions are called as in most languages, i.e. nameOfFunction(expression1, expression2, ...), where each element in expression1, expression2, ... are themselves XExpressions. These expressions are evaluated first, then the results of these evaluations are passed in to the function ("eager evaluation", as in Java for example). The only exception is the composite laws listed in laws { .. }, as described above, in which case evaluation of the argument is delayed at initialization and instead repeated each time the density is evaluated during MCMC sampling (a form of "lazy evaluation" in this unique special case).

In all cases, the actual function call only involves copying a constant size register so that function calls are always very cheap. For primitives, the value of the primitive is copied (and hence the original primitive can never suffer side effects from the call). For references, the memory address in the reference is copied (and hence the original reference cannot be changed, although the object it points to might have its state changed by the function call).

To create your own function, create a separate Java or Xtend file. In Java, use:

package my.pack; public class MyFunctions { public static ReturnType myFunction(ArgumentType1, arg1, ArgumentType2, arg2) { // some computation return result; } }

In Xtend:

package my.pack class MyFunctions { def static ReturnType myFunction(ArgumentType1, arg1, ArgumentType2, arg2) { // some computation return result } }

Then, add import static my.pack.MyFunction.* to your Blang file. You will now be able to call myFunction(arg1, arg2).

Objects are created by writing new NameOfClass(argument1, ...). This can be shortened to new NameOfClass if there are no arguments.

In some libraries, the call to new is wrapped inside a static function. In this case, just call the function to instantiate the object.

Classes have instance variables or field (variables guaranteed to be available for a given type), as well as (instance) methods (functions associated with the object having access to the object's instance variables). Collectively, fields and methods are called features.

Features are accessed using the "dot" notation: object.variable and object.method(...). When a method has no argument, the call can be shortened to object.method.

The ability to call a feature is subject to Java visibility constraints. In short, only public features can be called from outside file declaring a class.

The special variable it allows to provide a default receiver for feature calls. For example:

val it = someObject it.doSomething doSomething // equivalent short form

A "lambda expression" is an unfortunate name for a simple concept: a succinct way to write function without having to give it a name. This makes it easy to call functions which take functions as argument (e.g. to apply the function to each item in a list, etc). Since they are so useful, many syntactic shortcut are available.

Explicit syntax for lambda expressions is:

[Type1 argument1, ... | functionBody ]

For example, to capitalize words in a list:

#["one", "two"].map([String s | s.toUpperCase])

When there is a single input argument, you can skip declaring the argument, and instead the argument will be assigned to it (describe in the previous section). This allows us to write for example:

#["one", "two"].map([toUpperCase])

Finally, when the last argument of a function is a function, you can simply put the lambda after the parentheses of the function call. For example:

#["one", "two"].map[toUpperCase]

Lambda expression can also access final variables (i.e. marked by val) that are in the scope.

Lambda expressions can be automatically cast to interfaces having a single declared method.

Type casts work as in Java, but with a more readable syntax: aDoubleVariable as int instead of (int) aDoubleVariable.

Boxing refers to wrapping a primitive such as int or double into an object such as Integer or Double. Deboxing is the reverse process. As in Java, the conversion between the two (boxing/deboxing) is automatic in the vast majority of the cases.

Blang adds boding/deboxing to and from IntVar and RealVar.

The scope of a variable is the subset of the code in which it can be accessed. Scoping in Blang generally works as in most programming language: to find the scope of a variable, identify the parent set of braces, these determine the region of the code where the variable can be accessed. If one variable reference is in the scope of several variables declared with the same name, the innermost set of braces has priority.

The only exception are the arguments of the atomic and composite laws, which require explicit identification of the variables to include in the scope. These variables to be included should be identified at the right of the | symbol. We make this modification because these scoping dependencies drive the inference of the sparsity patterns in the graphical model.

Operator overloading is permitted. One important case to be aware of is == which is overloaded to .equals(..). Use === for the low-level equality operator that checks if the two sides are identical (with the exception of Double.NaN, Not a Number, which following IEEE convention is never equal to anything).

When in the Blang IDE, command click on an operator to reveal its definition.

Some useful operators automatically imported:

• object => lambdaExpression: calls the lambda expression with the input given by object, e.g. new ArrayList => [add("to be added in list")]

• range operators, for example 0 .. 10, 0 ..< 11, -1 >.. 10; all these examples return the integers $$0, 1, 2, ..., 10$$.

See the Xtend documentation if you want to overload operators in custom types.

Extension methods provide a kind of lightweight trait, i.e. adding methods to existing classes on demand.

This is done by adding an extension import:

import static extension my.namespace.myfunction

You can then write arg1.myfunction(arg2, ...) instead of myfunction(arg1, arg2, ...).

Types can be parameterized, for example to use Java's List type, it is preferable to specify the type that will stored in the list. For example to declare that strings will be stored, use List<String> as in Java or Xtend.

Models can use variables with type parameters but models themselves cannot have type parameters at the moment.

Throw exceptions to signal abnormal behaviour and to terminate the Blang runtime with an informative message:

throw new RuntimeException("Some error message.")

To signal that the current factor has invalid parameters: if possible just return the value -INFINITY, or if it is not easy for a certain code structure, use instead blang.types.StaticUtils.invalidParameter, which will be caught and interpreted as the factor having zero probability.

In contrast to Java, exception are never required to be declared or caught. If they need to be caught, the syntax is:

try { // code that might throw an exception } catch (ExceptionType exceptionName) { // process exception } // optionally: finally { // code executed whether the exception is thrown or not }

There are a few other aspects of XExpressions that we haven't covered here:

• the synchronized keyword and a rich parallelization library;

• optional dispatch method, allowing to mix and match static and runtime method polymorphism;

• active annotation, which along with the reflection API, allows powerful meta-programming;

• built-in string templates.

Detailed description of these features can be found in the Xtend documentation.

### Creating classes, interfaces, annotation interface and enumerations

As customary, Java types should be created in separate files and imported into Blang models as needed. The separate files can be written either in Java or in Xtend.