In this page, we cover
Input: how to load data. This is used for:
fixing a random variable's value to a given observation (conditioning),
setting the hyper-parameters of models,
setting the tuning parameters of inference algorithms.
Output: how to control the output of samples when custom types are used.
Inputs are controlled using the injection framework inits which is designed for dependency injection in the context of scientific models. This is entirely automatic for existing Blang types, only read this section if you want to create custom data types and would like to condition on them, i.e. load data and fix the variable to that loaded value.
To summarize, instantiation of arbitrary types is approached recursively with these main schemes:
When instantiating a class:
A constructor or static factory is selected by looking
for the annotation @ProvidesFactory
in the file declaring the custom
type. See also the class Parsers
which contains examples for basic types e.g. those from the
JVM or from xlinear. As a fall-back default, using a no-argument
constructor, if available, will be attempted.
Each argument in this constructor or static factories should be annotated as follows:
For arguments to be read from the command line, use
@ConstructorArg(value = "nameOfArg")
. The type of each
argument will be recursively inspected to figure out how to parse it.
To bootstrap the process, you can also declare an argument
@Input String string
or @Input List
and parse that provided string or strings manually.
To mark certain entries as observed, you can
make the random variable immutable.
Alternatively, you can mark
subgraphs of the accessibility graph as observed by
declaring a constructor argument
@GlobalArg Observations initContext
and
then calling initContext.markAsObserved(object)
.
To recursively parse other strings to be converted to
arbitrary types, declare a constructor argument
@InitService Creator creator
and call
creator.init(type, arguments)
where
type can be a class literal (such as String, Integer) or an instance of TypeLiteral
.
Arguments can be obtained via SimpleParser.parse(string)
in most cases.
As a short-hand, it is also possible to annotate fields
with @Arg
, this will cause them to be
populated automatically after calling the constructor or
static factory.
Both for @Arg
and @ConstructorArg
,
you can give a default value to the argument via
@DefaultValue
, or make it optional by enclosing the
declared type into an Optional<..>
.
When instantiating an interface, the following is also available:
Add the annotation @Implementations
to the interface,
with a list of comma-separated implementations.
Then follow the above process for each implementation.
Enumerations (enum) are taken care of automatically.
For more information, see the README.md file in the inits repository.
As a convention, we use the string NA
to
mean unobserved (latent). This string can be accessed
in a type safe manner via NA:SYMBOL
.
Argument parsing is automatically taken care of (by introspection of the injection framework's annotations). Naming of switches is done hierarchically.
Here is a concrete example to show how it works. In Blang's main
class, there is an annotated field @Arg PosteriorInferenceEngine engine
.
This type declares the following implementations:
Now let's look at one of those implementations, say SCM. SCM's parent class
is AdaptiveJarzynski, which declares @Arg Cores nThreads
.
In turn, the Core
declares the following static factory:
This creates the following command line options (described here by a snippet of
what is produced by --help
:
Every Blang execution creates a unique directory. The path is output to standard out at the end of the run.
The latest run is also softlinked at results/latest
.
The directory has the following structure:
samples/
: samples from the target distribution. By default each random variable in the
running model is output for each iteration (to disable for some variables, e.g. those that are fully
observed, use --excludeFromOutput
). We describe the format in more detail below.
logNormalizationEstimate.csv
: estimate of the natural logarithm of the probability of the data
(also known as the log of the normalization constant of the prior times the likelihood, integrating over the latent).
Only available for certain inference engines such as SCM.
arguments*
: arguments used in this run.
executionInfo/
: additional information for reproducibility (JVM arguments,
standard out, etc). To automatically extract code version, use --experimentConfigs.recordGitInfo true
.
monitoring/
: diagnostic for the samplers.
The samples are stored in tidy csv files. For example, two samples for a list of two RealVar's would look like:
By default, the method toString
is used to create the last column (value).
This behaviour can be
customized to follow the tidy philosophy. To do so, implement the interface
TidilySerializable
(example available here).
The following command line arguments can be used to tune the output:
--excludeFromOutput
: space separated list of random variables to exclude from output.
--experimentConfigs.managedExecutionFolder
: set to false to output in the current folder instead of in the
unique folder created in results/all.
--experimentConfigs.recordExecutionInfo
: set to false to skip recording reproducibility information
in executionInfo.
--experimentConfigs.recordGitInfo
: set to true to record git repo info for the code.
--experimentConfigs.saveStandardStreams
: set to false to skip recording the standard out and err.
--experimentConfigs.tabularWriter
: by default, CSV
. Can set to Spark
to
organize tidy output into a hierarchy of directories each having a csv (with less column as many columns are in this format
now inferable from the names of the parent directories). In certain scenario this could save disk space. Inter-operable with
Spark.