-
Notifications
You must be signed in to change notification settings - Fork 0
User guide
As a new user with new applications and experiments in mind, one would want to establish a custom configuration for BayesForest. This guide explains the customization one could attain using default toolbox capabilities.
A user might want to recall the BayesForest outline. The main points of customization are:
- QSM and corresponding data sets Ud
- SSM and corresponding data sets Um
- distance measure
- optimization algorithm
Below we discuss each of the points. We note beforehand that QSM and SSM are, perhaps, the main target for user customization. Distance and optimization routine are less important as they were designed to meet specific needs for the clonal morphology generation.
It is important to realize that the structural data sets Ud and Um extracted from QSM and SSM, respectively, are the main target of BayesForest, it is a stage where all scientific action takes place.
The Quantitative Structure Model (QSM, first introduced by Raumonen et al., Rem Sens, 2013) is essentially a reconstruction of a real tree. QSM is used to extract structural feature data sets Ud.
The software described in Raumonen et al., Rem Sens, 2013 is available on GitHub:
There are some other interesting QSM-related packages on GitHub:
- QSM-FaNNI for the leaf coverage of the bare tree skeleton
- QSM Blender Addon for using QSM's in Blender 3D animations.
All these packages are made available by the Inverse Problems Research Group at Tampere University of Technology.
The native (text) format in which BayesForest accepts Ud
data sets is a result of a surface reconstruction from Terrestrial
Laser Scanning (TLS) data described in Raumonen et al., Rem Sens,
2013 (see
TreeQSM package). This
format is described in function import_qsm_data
(see help import_qsm_data
). The native format was developed for the algorithm
described in Raumonen et al., Rem Sens,
2013, which takes
as an input TLS point cloud (real measurements) and produces QSM
(TreeQSM package).
The native format describes basic geometrical and topological relations of branches and segments constituting them. For example, orientational vectors of segments, lengths, indices of segments' parental segments, branch orders and angles. These relations can further be combined into various advanced features, describing a tree in slightly more sophisticated manner. For example, branch curvature in space and tapering function of a branch.
The basic relations can describe any potential tree structure. Thus, they can be used in other applications/algorithms.
The basic relations are usually shipped in a text file or mat file. If it is a text
file import_qsm_data
reads the file and transforms it to a mat file. The mat file
contains the basic relations of the native format and can be processed further.
For instance, in Espoo
maple
example a text format with import_qsm_data
is used and Ruotsinkylä
pines
example uses the binary mat file.
It was proposed to use two big data sets for branch and segment associated
features. After basic relations in the native format are saved in a mat file, one can
use further gen_scatter2
to generate the advanced feature data set Ud:
[bra, seg, tree] = gen_scatter2(<input-mat-file>);
The above command produces the branch (bra
) and segment (seg
) data, and a tree
representation in tree
(tree can be plotted by tree.draw
command). bra
and
seg
are the struct's with two fields: info
and data
. The info
field contains
the feature codenames, data
contains the actual data points sorted by the
topological order: in fact, data
is a cell array data{w}
is a data table for
order w
. The codenames of the features are listed below.
Currently, the advanced features are:
- branch:
- branching angle, deg (codename
bra
) - azimuthal angle, deg (
az
) - full length of a branch, m (
ltot
) - radius of the first segment in a branch, m (
rini
) - distance from the beginning of the parent branch to the point where the current
branch emanates, m (
lapar
)
- branching angle, deg (codename
- segment:
- radius, m (codename
rad
) - length, m (
len
) - angle between the current segment and its parent in horizontal projection, deg (
gamma
) - angle between the current segment and its parent in vertical projection, deg (
zeta
)
- radius, m (codename
See help gen_scatter2
for detail.
BayesForest compares data tables one pair at a time to produce a distance value. The
data sets Ud and Um are cell arrays with the same number of
data tables. Each pair { Ud{x}
, Um{x}
} is compared. Normally user does not want
all the data sets stored in bra
and seg
generated by gen_scatter2
. To extract
further the tables one can use arrange_scatter
function. This function allows for
exhaustive description of the data characteristics to extract. If this function is
used on both QSM and SSM data sets, it ascertains the same type of scatters to be
compared.
See help arrange_scatter
for further details.
In the configuration file one needs to specify the following options when the native format is used:
-
qsm_mat_file
: a mat file for the basic relations -
qsm_cyl_table
: a text file or a variable in the Matlab workspace for segment data -
qsm_br_table
: a text file or a variable in the Matlab workspace for branch data
See help bf_process_input
for detail.
Custom user's data can be defined in BayesForest. It is preferrable to use the native
format and advanced features with import_qsm_data
and gen_scatter2
engines (see
below A and
B. This way one can describe QSM of interest on
the basic level as well as on the level of advanced features.
Custom data is also supported (see C). But no visual representation is made automatically this way (but user still can define visual a QSM object to plot).
The native format is essentially a naming convention for the basic relations. It is augmented (for text file formats) with the rules for variable arrangement in tables. It is an exhaustive list of geometrical and topological characteristics. Thus, it can describe any real tree.
If one describes QSM with this format after it was obtained by any means (not only TLS), BayesForest is able to utilize the data set in the subsequent analysis.
Read about the text file format from help import_qsm_data
.
This way one can be sure that gen_scatter2
understands the native format and
produces the corresponding visual tree representation (plot
and movie
configuration options can be on).
Example
One can easily produce/convert a tree structure into a representation
within BayesForest using the basic relations. We will make a simple tree structure
with 6 cylinders/segments.
First, we need to make a segment table. The import_qsm_data
describes the format
as:
Radius | Len | Start (X,Y,Z) | Axis (X,Y,Z) | Parent | Extension | Seg Branch |
So we make 6 segments (each row is specification of a segment) and save them in a file:
segment = [
0.05 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0 2 1;
0.03 1.0 0.0 0.0 1.0 0.0 0.0 1.0 1 0 1;
0.03 0.7 0.0 0.0 1.0 0.25 0.25 0.25 1 0 1;
0.03 0.7 0.0 0.0 1.0 -0.25 0.25 0.25 1 0 1;
0.03 0.7 0.0 0.0 1.0 -0.25 -0.25 0.25 1 0 1;
0.03 0.7 0.0 0.0 1.0 0.25 -0.25 0.25 1 0 1
];
dlmwrite('seg.dat',segment,' ');
In principle, we do not need branch information here. But for the sake of consistency
import_qsm_data
does not accept only segment data. Branch format is:
Br Order | Br Parent | Br Volume (not needed=0) | Br Length | Br Angle |
So we make 5 branches (one vertical axis branch and four radial branches) and save them:
branch = [
0 0 0 0.08 0;
1 1 0 0.03 45;
1 1 0 0.03 45;
1 1 0 0.03 45;
1 1 0 0.03 45
];
dlmwrite('bra.dat',branch,' ');
Finally, import these data and plot the resulting tree structure:
tr = import_qsm_data('bra.dat','seg.dat');
tr.draw;
This will create a figure with the tree and the mat file qsm_out.mat
with the data
for subsequent analysis with gen_scatter2
:
The same way any tree structure can be encoded with the native basic relation format. (Please, refer to Notations for explanation of the terminology.)
One can also use the existing vehicle of function gen_scatter2
to produce new
combinations of basic relations and new features from the native QSM format. Note
that all geometrical and topological features are encoded in the basic
relations. Thus, all possible combinations of the derived features can be achieved.
Use help gen_scatter2
to proceed this way and add new features in the Branch and
Segment struct
's (see help gen_scatter2
).
This way also ascertains that the tree visual structure is produced during
BayesForest operation (plot
and movie
on).
There is a possibility to use custom user-defined data by specifying a variable or a
tabular file in the configuration file via the option qsm_table
. This option could
be, for example, suitable if the options above do not fit user's needs. Additionally,
this could be a good option, if the conventional
description
of a tree in terms of branches with topological orders is not in use.
See help bf_process_input
for detail.
Note that in this case BayesForest does not have information on the tree structure
corresponding to the custom data sets. Thus, configuration options plot
and movie
should be set to 0.
One can use configuration option qsm_tree
referring to a tree
class variable in
the Matlab workspace to use for QSM visual representation during optimization (plot
and movie
can be set to 1 in this case). There are many ways on producing a
tree
-class structure, e.g. using basic relation tables as in
A or using tree
class directly. Note that SSM
must be producing a compatible tree
-class structure to be visualized against QSM.
See also help tree
.
SSM is an analytical/theoretical/heuristic/procedural model of the tree growth. The
function that generates structural data sets Um must produce the
scatters compatible with the arrange_scatter
output for QSM
or with the custom user's data.
A user should write a function that takes as an input the array of parameters to be optimized and returns Um.
The configuration option to specify the function is ssm_fun
(see help bf_process_input
).
An example of such function is ssm_lpfg2
function that simulates an external (to
Matlab) system, written in L+C and run in LPFG simulator, which is part of
VLAB/L-studio software (see VLAB or L-studio at
http://algorithmicbotany.org/virtual_laboratory/). External simulators are
preferrable as Matlab is quite slow in simulating tree growth, which takes up
resources exponentially. LPFG is a good simulator with a wide range of modeling
possibilities and compilable code that makes the models faster. See help ssm_lpfg2
for details on how to set up a model in LPFG compatible with BayesForest.
The two LPFG models developed in our group and suitable for the use with ssm_lpfg2
are LIGNUM pine tree growth model and
Self-organizing tree growth model. These models
can be used to develop custom models that are to be consistent with BayesForest.
The LPFG models mentioned above use a specific scheme for generating the data. This
format is understood further by BayesForest read_scatter_dat2
function. Remember
info
and data
fields of the Branch and Segment related data sets being a result
of gen_scatter2
function? The data format the read_scatter_dat2
understands is
fully compatible with that.
Example:
# Branch: bra az ltot rini lapar
# Segment: rad len gamma zeta
# order 0
0.131615 0 0 0
0.0670445 0.2 0 93.344
0.067031 0.408526 -100.854 97.7182
0.0667897 0.616695 122.648 64.2943
# order 1
14.8347 43.8704 7.97732 0.0400599 1.1938
40.0978 166.237 5.55932 0.0256525 3.18263
5.08404 83.8318 5.34703 0.0222212 3.38367
0.0400599 0 0 0
0.0388326 0.204462 -60.0237 16.4455
0.0367957 0.411766 110.257 92.6963
0.0367437 0.634008 -0.530325 -9.53711
First goes the header, which specifies the Branch and Segment data codenames (exactly
like in gen_scatter2
), only once in the beginning of the file.
Next goes the order information as # order w
, where w
is the order value.
Below the order directive go two types of the data sets: first — Branch, second — Segment. Note that for order 0 there is no Branch data.
Also note two empty lines separating data sets (non-data statements, directives, may not be separated).
Number of columns in a data set corresponds to the number of codenames specified in the header for this type of scatter data.
For instance, the second segment of order 0 in the example above has radius 0.0670445 m and length 0.2 m, the first branch of order 1 has branching angle of 14.8347 deg and its first segment radius 0.0400599 (note that the first segment of order 1 has the same radius as it composes the very first branch of this order).
See Notations and Native format processing for the terminology and codenames description, respectively.
The tree structure for visualization is used only when the best-fit SSM is found
after all optimization routines are finished. Since producing and encoding the tree
structure might be a long process, the user has an option to specify the best-fit SSM
function via the configuration option ssm_fun_best
. Usually it is a variant of the
function in ssm_fun
that, besides usual simulation, produces a code for the output
structure. Currently, the Multiscale Tree Graph (MTG) format is supported.
The best-fit SSM (configuration option ssm_fun_best
) should produce a file named
"out.mtg", which encodes the tree structure (function read_mtg
understands the
code).
The format:
/C0 1 0 0.459695 1.4964 (0,0,0) (0,0.459695,0) (0,1,0) 0 0 0 -1 4 (1) 0
<C4 2 0 0.461133 1.05851 (0,0.459695,0) (-0.00606877,0.920222,-0.0228507) (-0.0131606,0.998685,-0.0495534) 0 0 0 0 5 () 0
C0+C1 2 1 0.542425 0.745248 (0,0.459695,0) (-0.525203,0.324383,0.0087781) (-0.968251,-0.249457,0.0161831) 0 0 0 0 2 () 0
<C2 2 1 0.542425 0.737447 (-0.525203,0.324383,0.0087781) (-1.05827,0.225519,0.0258918) (-0.982743,-0.182265,0.0315504) 0 0 0 1 3 () 0
First column — xCN
, where N
is a segment number (cylinder, hence C
), x
is an indicator of topology of the segment: /
the first segment of the tree, CN+
a child of the parent segment CN, <
an extention of a segment previously defined
with either /
or CN+
. In the above example, segment 0 is the first segment,
segment 4 is its extension, segment 1 is the 0's child, segment 2 is an extension of
segment 1. See
Notations
for details of terminology.
The subsequent columns describe the geometry and some topology of the segment. Vectors go in parenthesis, like axis or start position. Here is the list of features from left to right:
- age: number of iteration the segment exist
- order: topological order
- length
- radius
- start position: vector
- end position: vector
- axis: vector
- free placeholder (historically, in Lignum model was initial foliage mass of the segmet)
- free placeholder (historically, in Lignum model was current foliage mass)
- free placeholder (historically, in Lignum model, was heartwood radius)
- parent segment index
- extension index
- children list: vector of indices
- logical: if the segment deleted during growth.
Use the following C++ snippets to include into your own LPFG model code or modify for your own purposes:
- cylinder/segment and branch classes
- MTG file writer (uses the cylinder class of the above Gist-snippet).
See also: A Multiscale Model of Plant Topological Structures by C. Godin and Y. Caraglio.
The main distance function is dt_distance
. During optimization optim_avg_distance
is used to calculate average distances from multiple data sets when needed. The
distance is described in Potapov et
al. GigaScience. Consult help dt_distance
for further reference.
The user is able to control the behavior of the genetic algorithm (GA) using the
configuration options starting with ga_
. See help bf_process_input
for more
details. The GA options are described at the dedicated Matlab
pages.
BayesForest Toolbox
The Inverse Problems Research Group, Tampere University of Technology
Copyright 2013-2017 Ilya Potapov, contribute and distribute freely