User guide

As a new user with new applications and experiments in mind, one would want to establish a custom configuration for BayesForest. This guide explains the customization one could attain using default toolbox capabilities.

A user might want to recall the BayesForest outline. The main points of customization are:

QSM and corresponding data sets U_d
SSM and corresponding data sets U_m
distance measure
optimization algorithm

Below we discuss each of the points. We note beforehand that QSM and SSM are, perhaps, the main target for user customization. Distance and optimization routine are less important as they were designed to meet specific needs for the clonal morphology generation.

It is important to realize that the structural data sets U_d and U_m extracted from QSM and SSM, respectively, are the main target of BayesForest, it is a stage where all scientific action takes place.

QSM

The Quantitative Structure Model (QSM, first introduced by Raumonen et al., Rem Sens, 2013) is essentially a reconstruction of a real tree. QSM is used to extract structural feature data sets U_d.

The software described in Raumonen et al., Rem Sens, 2013 is available on GitHub:

TreeQSM.

There are some other interesting QSM-related packages on GitHub:

QSM-FaNNI for the leaf coverage of the bare tree skeleton
QSM Blender Addon for using QSM's in Blender 3D animations.

All these packages are made available by the Inverse Problems Research Group at Tampere University of Technology.

1. Native format

The native (text) format in which BayesForest accepts U_d data sets is a result of a surface reconstruction from Terrestrial Laser Scanning (TLS) data described in Raumonen et al., Rem Sens, 2013 (see TreeQSM package). This format is described in function import_qsm_data (see help import_qsm_data). The native format was developed for the algorithm described in Raumonen et al., Rem Sens, 2013, which takes as an input TLS point cloud (real measurements) and produces QSM (TreeQSM package).

The native format describes basic geometrical and topological relations of branches and segments constituting them. For example, orientational vectors of segments, lengths, indices of segments' parental segments, branch orders and angles. These relations can further be combined into various advanced features, describing a tree in slightly more sophisticated manner. For example, branch curvature in space and tapering function of a branch.

The basic relations can describe any potential tree structure. Thus, they can be used in other applications/algorithms.

The basic relations are usually shipped in a text file or mat file. If it is a text file import_qsm_data reads the file and transforms it to a mat file. The mat file contains the basic relations of the native format and can be processed further.

For instance, in Espoo maple example a text format with import_qsm_data is used and Ruotsinkylä pines example uses the binary mat file.

Native format processing (advanced features)

It was proposed to use two big data sets for branch and segment associated features. After basic relations in the native format are saved in a mat file, one can use further gen_scatter2 to generate the advanced feature data set U_d:

[bra, seg, tree] = gen_scatter2(<input-mat-file>);

The above command produces the branch (bra) and segment (seg) data, and a tree representation in tree (tree can be plotted by tree.draw command). bra and seg are the struct's with two fields: info and data. The info field contains the feature codenames, data contains the actual data points sorted by the topological order: in fact, data is a cell array data{w} is a data table for order w. The codenames of the features are listed below.

Currently, the advanced features are:

branch:
- branching angle, deg (codename bra)
- azimuthal angle, deg (az)
- full length of a branch, m (ltot)
- radius of the first segment in a branch, m (rini)
- distance from the beginning of the parent branch to the point where the current branch emanates, m (lapar)
segment:
- radius, m (codename rad)
- length, m (len)
- angle between the current segment and its parent in horizontal projection, deg (gamma)
- angle between the current segment and its parent in vertical projection, deg (zeta)

See help gen_scatter2 for detail.

Arranging data sets

BayesForest compares data tables one pair at a time to produce a distance value. The data sets U_d and U_m are cell arrays with the same number of data tables. Each pair { Ud{x}, Um{x} } is compared. Normally user does not want all the data sets stored in bra and seg generated by gen_scatter2. To extract further the tables one can use arrange_scatter function. This function allows for exhaustive description of the data characteristics to extract. If this function is used on both QSM and SSM data sets, it ascertains the same type of scatters to be compared.

See help arrange_scatter for further details.

Configuration file

In the configuration file one needs to specify the following options when the native format is used:

qsm_mat_file: a mat file for the basic relations
qsm_cyl_table: a text file or a variable in the Matlab workspace for segment data
qsm_br_table: a text file or a variable in the Matlab workspace for branch data

See help bf_process_input for detail.

2. User-defined formats

Custom user's data can be defined in BayesForest. It is preferrable to use the native format and advanced features with import_qsm_data and gen_scatter2 engines (see below A and B. This way one can describe QSM of interest on the basic level as well as on the level of advanced features.

Custom data is also supported (see C). But no visual representation is made automatically this way (but user still can define visual a QSM object to plot).

A. Make the native format yourself

The native format is essentially a naming convention for the basic relations. It is augmented (for text file formats) with the rules for variable arrangement in tables. It is an exhaustive list of geometrical and topological characteristics. Thus, it can describe any real tree.

If one describes QSM with this format after it was obtained by any means (not only TLS), BayesForest is able to utilize the data set in the subsequent analysis.

Read about the text file format from help import_qsm_data.

This way one can be sure that gen_scatter2 understands the native format and produces the corresponding visual tree representation (plot and movie configuration options can be on).

Example
One can easily produce/convert a tree structure into a representation within BayesForest using the basic relations. We will make a simple tree structure with 6 cylinders/segments.

First, we need to make a segment table. The import_qsm_data describes the format as:

Radius | Len | Start (X,Y,Z) | Axis (X,Y,Z) | Parent | Extension | Seg Branch |

So we make 6 segments (each row is specification of a segment) and save them in a file:

segment = [
0.05 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0 2 1;
0.03 1.0 0.0 0.0 1.0 0.0 0.0 1.0 1 0 1;
0.03 0.7 0.0 0.0 1.0 0.25 0.25 0.25 1 0 1;
0.03 0.7 0.0 0.0 1.0 -0.25 0.25 0.25 1 0 1;
0.03 0.7 0.0 0.0 1.0 -0.25 -0.25 0.25 1 0 1;
0.03 0.7 0.0 0.0 1.0 0.25 -0.25 0.25 1 0 1
];
dlmwrite('seg.dat',segment,' ');

In principle, we do not need branch information here. But for the sake of consistency import_qsm_data does not accept only segment data. Branch format is:

Br Order | Br Parent | Br Volume (not needed=0) | Br Length | Br Angle |

So we make 5 branches (one vertical axis branch and four radial branches) and save them:

branch = [
0 0 0 0.08 0;
1 1 0 0.03 45;
1 1 0 0.03 45;
1 1 0 0.03 45;
1 1 0 0.03 45
];
dlmwrite('bra.dat',branch,' ');

Finally, import these data and plot the resulting tree structure:

tr = import_qsm_data('bra.dat','seg.dat');
tr.draw;

This will create a figure with the tree and the mat file qsm_out.mat with the data for subsequent analysis with gen_scatter2:

The same way any tree structure can be encoded with the native basic relation format. (Please, refer to Notations for explanation of the terminology.)

B. New features with native format

One can also use the existing vehicle of function gen_scatter2 to produce new combinations of basic relations and new features from the native QSM format. Note that all geometrical and topological features are encoded in the basic relations. Thus, all possible combinations of the derived features can be achieved.

Use help gen_scatter2 to proceed this way and add new features in the Branch and Segment struct's (see help gen_scatter2).

This way also ascertains that the tree visual structure is produced during BayesForest operation (plot and movie on).

C. Custom tables

There is a possibility to use custom user-defined data by specifying a variable or a tabular file in the configuration file via the option qsm_table. This option could be, for example, suitable if the options above do not fit user's needs. Additionally, this could be a good option, if the conventional description of a tree in terms of branches with topological orders is not in use.

See help bf_process_input for detail.

Note that in this case BayesForest does not have information on the tree structure corresponding to the custom data sets. Thus, configuration options plot and movie should be set to 0.

One can use configuration option qsm_tree referring to a tree class variable in the Matlab workspace to use for QSM visual representation during optimization (plot and movie can be set to 1 in this case). There are many ways on producing a tree-class structure, e.g. using basic relation tables as in A or using tree class directly. Note that SSM must be producing a compatible tree-class structure to be visualized against QSM.

SSM

SSM is an analytical/theoretical/heuristic/procedural model of the tree growth. The function that generates structural data sets U_m must produce the scatters compatible with the arrange_scatter output for QSM or with the custom user's data.

A user should write a function that takes as an input the array of parameters to be optimized and returns U_m.

The configuration option to specify the function is ssm_fun (see help bf_process_input).

An example of such function is ssm_lpfg2 function that simulates an external (to Matlab) system, written in L+C and run in LPFG simulator, which is part of VLAB/L-studio software (see VLAB or L-studio at http://algorithmicbotany.org/virtual_laboratory/). External simulators are preferrable as Matlab is quite slow in simulating tree growth, which takes up resources exponentially. LPFG is a good simulator with a wide range of modeling possibilities and compilable code that makes the models faster. See help ssm_lpfg2 for details on how to set up a model in LPFG compatible with BayesForest.

The two LPFG models developed in our group and suitable for the use with ssm_lpfg2 are LIGNUM pine tree growth model and Self-organizing tree growth model. These models can be used to develop custom models that are to be consistent with BayesForest.

Data format

The LPFG models mentioned above use a specific scheme for generating the data. This format is understood further by BayesForest read_scatter_dat2 function. Remember info and data fields of the Branch and Segment related data sets being a result of gen_scatter2 function? The data format the read_scatter_dat2 understands is fully compatible with that.

Example:

# Branch: bra az ltot rini lapar
# Segment: rad len gamma zeta
# order 0
0.131615 0 0 0
0.0670445 0.2 0 93.344
0.067031 0.408526 -100.854 97.7182
0.0667897 0.616695 122.648 64.2943


# order 1
14.8347 43.8704 7.97732 0.0400599 1.1938
40.0978 166.237 5.55932 0.0256525 3.18263
5.08404 83.8318 5.34703 0.0222212 3.38367


0.0400599 0 0 0
0.0388326 0.204462 -60.0237 16.4455
0.0367957 0.411766 110.257 92.6963
0.0367437 0.634008 -0.530325 -9.53711

First goes the header, which specifies the Branch and Segment data codenames (exactly like in gen_scatter2), only once in the beginning of the file.

Next goes the order information as # order w, where w is the order value.

Below the order directive go two types of the data sets: first — Branch, second — Segment. Note that for order 0 there is no Branch data.

Also note two empty lines separating data sets (non-data statements, directives, may not be separated).

Number of columns in a data set corresponds to the number of codenames specified in the header for this type of scatter data.

For instance, the second segment of order 0 in the example above has radius 0.0670445 m and length 0.2 m, the first branch of order 1 has branching angle of 14.8347 deg and its first segment radius 0.0400599 (note that the first segment of order 1 has the same radius as it composes the very first branch of this order).

See Notations and Native format processing for the terminology and codenames description, respectively.

SSM tree

The tree structure for visualization is used only when the best-fit SSM is found after all optimization routines are finished. Since producing and encoding the tree structure might be a long process, the user has an option to specify the best-fit SSM function via the configuration option ssm_fun_best. Usually it is a variant of the function in ssm_fun that, besides usual simulation, produces a code for the output structure. Currently, the Multiscale Tree Graph (MTG) format is supported.

The best-fit SSM (configuration option ssm_fun_best) should produce a file named "out.mtg", which encodes the tree structure (function read_mtg understands the code).

The format:

/C0 1 0 0.459695 1.4964 (0,0,0) (0,0.459695,0) (0,1,0) 0 0 0 -1 4 (1) 0
<C4 2 0 0.461133 1.05851 (0,0.459695,0) (-0.00606877,0.920222,-0.0228507) (-0.0131606,0.998685,-0.0495534) 0 0 0 0 5 () 0
C0+C1 2 1 0.542425 0.745248 (0,0.459695,0) (-0.525203,0.324383,0.0087781) (-0.968251,-0.249457,0.0161831) 0 0 0 0 2 () 0
<C2 2 1 0.542425 0.737447 (-0.525203,0.324383,0.0087781) (-1.05827,0.225519,0.0258918) (-0.982743,-0.182265,0.0315504) 0 0 0 1 3 () 0

First column — xCN, where N is a segment number (cylinder, hence C), x is an indicator of topology of the segment: / the first segment of the tree, CN+ a child of the parent segment CN, < an extention of a segment previously defined with either / or CN+. In the above example, segment 0 is the first segment, segment 4 is its extension, segment 1 is the 0's child, segment 2 is an extension of segment 1. See Notations for details of terminology.

The subsequent columns describe the geometry and some topology of the segment. Vectors go in parenthesis, like axis or start position. Here is the list of features from left to right:

age: number of iteration the segment exist
order: topological order
length
radius
start position: vector
end position: vector
axis: vector
free placeholder (historically, in Lignum model was initial foliage mass of the segmet)
free placeholder (historically, in Lignum model was current foliage mass)
free placeholder (historically, in Lignum model, was heartwood radius)
parent segment index
extension index
children list: vector of indices
logical: if the segment deleted during growth.

Use the following C++ snippets to include into your own LPFG model code or modify for your own purposes:

cylinder/segment and branch classes
MTG file writer (uses the cylinder class of the above Gist-snippet).

See also: A Multiscale Model of Plant Topological Structures by C. Godin and Y. Caraglio.

Distance

The main distance function is dt_distance. During optimization optim_avg_distance is used to calculate average distances from multiple data sets when needed. The distance is described in Potapov et al. GigaScience. Consult help dt_distance for further reference.

Optimization algorithm

The user is able to control the behavior of the genetic algorithm (GA) using the configuration options starting with ga_. See help bf_process_input for more details. The GA options are described at the dedicated Matlab pages.

BayesForest Toolbox

The Inverse Problems Research Group, Tampere University of Technology