Skip to content

User guide

inuritdino edited this page Jun 24, 2017 · 24 revisions

As a new user with new applications and experiments in mind, one would want to establish a custom configuration for BayesForest. This guide explains the customization one could attain using default toolbox capabilities.

A user might want to recall the BayesForest outline. The main points of customization are (in descending order of relevance):

  • QSM and corresponding data sets Ud
  • SSM and corresponding data sets Um
  • distance measure
  • optimization algorithm

Below we discuss each of the points. We note beforehand that QSM and SSM are, perhaps, the main target for user customization. Distance and optimization routine are less important as they were designed to meet specific needs for the clonal morphology generation.

It is important to realize that the structural data sets Ud and Um extracted from QSM and SSM, respectively, are the main target of BayesForest, it is a stage where all scientific action takes place.

QSM

The Quantitative Structure Model (QSM) is essentially a reconstruction of a real tree. QSM is used to extract structural feature data sets Ud.

1. Native format

The native (text) format in which BayesForest accepts Ud data sets is a result of a surface reconstruction from Terrestrial Laser Scanning (TLS) data described in Raumonen et al., Rem Sens, 2013. This format is described in function import_qsm_data (see help import_qsm_data). The native format is specific to the algorithm described in Raumonen et al., Rem Sens, 2013, which takes as an input TLS point cloud (real measurements) and produces QSM.

The native format describes basic geometrical and topological relations of branches and segments constituting them. For example, orientational vectors of segments, lengths, indices of segments' parental segments, branch orders and angles. These relations can further be combined into various advanced features, describing a tree in slightly more sophisticated manner. For example, branch curvature in space and tapering function of a branch.

The basic relations are usually shipped in a text file or mat file. If it is a text file import_qsm_data reads the file and transforms it to a mat file. The mat file contains the basic relations of the native format and can be processed further.

For instance, in Espoo maple example a text format with import_qsm_data is used and Ruotsinkylä pines example uses the binary mat file.

Native format processing (advanced features)

It was proposed to use two big data sets for branch and segment associated features. After basic relations in the native format are saved in a mat file, one can use further gen_scatter2 to generate the advanced feature data set Ud:

[bra, seg, tree] = gen_scatter2(<input-mat-file>);

The above command produces the branch (bra) and segment (seg) data, and a tree representation in tree (tree can be plotted by tree.draw command). bra and seg are the struct's with two fields: info and data. The info field contains the feature codenames, data contains the actual data points sorted by the topological order: in fact, data is a cell array data{w} is a data table for order w. The codenames of the features are listed below.

Currently, the advanced features are:

  • branch:
    • branching angle, deg (codename bra)
    • azimuthal angle, deg (az)
    • full length of a branch, m (ltot)
    • radius of the first segment in a branch, m (rini)
    • distance from the beginning of the parent branch to the point where the current branch emanates, m (lapar)
  • segment:
    • radius, m (codename rad)
    • length, m (len)
    • angle between the current segment and its parent in horizontal projection, deg (gamma)
    • angle between the current segment and its parent in vertical projection, deg (zeta)

See help gen_scatter2 for detail.

Arranging data sets

BayesForest compares data tables one pair at a time to produce a distance value. The data sets Ud and Um are cell arrays with the same number of data tables. Each pair { Ud{x}, Um{x} } is compared. Normally user does not want all the data sets stored in bra and seg generated by gen_scatter2. To extract further the tables one can use arrange_scatter function. This function allows for exhaustive description of the data characteristics to extract. If this function is used on both QSM and SSM data sets, it ascertains the same type of scatters to be compared.

See help arrange_scatter for further details.

Configuration file

In the configuration file one needs to specify the following options when the native format is used:

  • qsm_mat_file: a mat file for the basic relations
  • qsm_cyl_table: a text file or a variable in the Matlab workspace for segment data
  • qsm_br_table: a text file or a variable in the Matlab workspace for branch data

See help bf_process_input for detail.

2. User-defined formats

Custom user's data can be defined in BayesForest. It is preferrable to use the native format and advanced features with import_qsm_data and gen_scatter2 engines (see below A and B. This way one can describe QSM of interest on the basic level as well as on the level of advanced features.

Custom data is also supported (see C). But no visual representation is made automatically this way (but user still can define visual a QSM object to plot).

A. Make the native format yourself

The native format is essentially a naming convention for the basic relations. It is augmented (for text file formats) with the rules for variable arrangement in tables. It is an exhaustive list of geometrical and topological characteristics. Thus, it can describe any real tree.

If one describes QSM with this format after it was obtained by any means (not only TLS), BayesForest is able to utilize the data set in the subsequent analysis.

Read about the text file format from help import_qsm_data.

This way one can be sure that gen_scatter2 understands the native format and produces the corresponding visual tree representation (plot and movie configuration options can be on).

Example
One can easily produce/convert a tree structure into a representation within BayesForest using the basic relations. We will make a simple tree structure with 6 cylinders/segments.

First, we need to make a segment table. The import_qsm_data describes the format as:

Radius | Len | Start (X,Y,Z) | Axis (X,Y,Z) | Parent | Extension | Seg Branch |

So we make 6 segments (each row is specification of a segment) and save them in a file:

segment = [
0.05 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0 2 1;
0.03 1.0 0.0 0.0 1.0 0.0 0.0 1.0 1 0 1;
0.03 0.7 0.0 0.0 1.0 0.25 0.25 0.25 1 0 1;
0.03 0.7 0.0 0.0 1.0 -0.25 0.25 0.25 1 0 1;
0.03 0.7 0.0 0.0 1.0 -0.25 -0.25 0.25 1 0 1;
0.03 0.7 0.0 0.0 1.0 0.25 -0.25 0.25 1 0 1
];
dlmwrite('seg.dat',segment,' ');

In principle, we do not need branch information here. But for the sake of consistency import_qsm_data does not accept only segment data. Branch format is:

Br Order | Br Parent | Br Volume (not needed=0) | Br Length | Br Angle |

So we make 5 branches (one vertical axis branch and four radial branches) and save them:

branch = [
0 0 0 0.08 0;
1 1 0 0.03 45;
1 1 0 0.03 45;
1 1 0 0.03 45;
1 1 0 0.03 45
];
dlmwrite('bra.dat',branch,' ');

Finally, import these data and plot the resulting tree structure:

tr = import_qsm_data('bra.dat','seg.dat');
tr.draw;

This will create a figure with the tree and the mat file qsm_out.mat with the data for subsequent analysis with gen_scatter2:

The same way any tree structure can be encoded with the native basic relation format. (Please, refer to Notations for explanation of the terminology.)

B. New features with native format

One can also use the existing vehicle of function gen_scatter2 to produce new combinations of basic relations and new features from the native QSM format. Note that all geometrical and topological features are encoded in the basic relations. Thus, all possible combinations of the derived features can be achieved.

Use help gen_scatter2 to proceed this way and add new features in the Branch and Segment struct's (see help gen_scatter2).

This way also ascertains that the tree visual structure is produced during BayesForest operation (plot and movie on).

C. Custom tables

There is a possibility to use custom user-defined data by specifying a variable or a tabular file in the configuration file via the option qsm_table. This option could be, for example, suitable if the options above do not fit user's needs. Additionally, this could be a good option, if the conventional description of a tree in terms of branches with topological orders is not in use.

See help bf_process_input for detail.

Note that in this case BayesForest does not have information on the tree structure corresponding to the custom data sets. Thus, configuration options plot and movie should be set to 0.

One can use configuration option qsm_tree referring to a tree class variable in the Matlab workspace to use for QSM visual representation during optimization (plot and movie can be set to 1 in this case). There are many ways on producing a tree-class structure, e.g. using basic relation tables as in A or using tree class directly. Note that SSM must be producing a compatible tree-class structure to be visualized against QSM.

See also help tree.

SSM

SSM is an analytical/theoretical/heuristic/procedural model of the tree growth. The function that generates structural data sets Um must produce the scatters compatible with the arrange_scatter output for QSM or with the custom user's data.

A user should write a function that takes as an input the array of parameters to be optimized and returns Um.

The configuration option to specify the function is ssm_fun (see help bf_process_input).

An example of such function is ssm_lpfg2 function that simulates an external (to Matlab) system, written in L+C and run in LPFG simulator, which is part of VLAB/L-studio software (see VLAB or L-studio at http://algorithmicbotany.org/virtual_laboratory/). External simulators are preferrable as Matlab is quite slow in simulating tree growth, which takes up resources exponentially. See help ssm_lpfg2 for details.

The two LPFG models suitable for use with ssm_lpfg2 are LIGNUM pine tree growth model and Self-organizing tree growth model. These models can be used to develop custom models that are to be consistent with BayesForest.

Data format

The LPFG models mentioned above use a specific scheme for generating the data. This format is understood further by BayesForest read_scatter_dat2 function. Remember info and data fields of the Branch and Segment related data sets being a result of gen_scatter2 function? The data format the read_scatter_dat2 understands is fully compatible with that.

Example:

# Branch: bra az ltot rini lapar
# Segment: rad len gamma zeta
# order 0
0.131615 0 0 0
0.0670445 0.2 0 93.344
0.067031 0.408526 -100.854 97.7182
0.0667897 0.616695 122.648 64.2943


# order 1
14.8347 43.8704 7.97732 0.0400599 1.1938
40.0978 166.237 5.55932 0.0256525 3.18263
5.08404 83.8318 5.34703 0.0222212 3.38367


0.0400599 0 0 0
0.0388326 0.204462 -60.0237 16.4455
0.0367957 0.411766 110.257 92.6963
0.0367437 0.634008 -0.530325 -9.53711

First goes the header, which specifies the Branch and Segment data codenames (exactly like in gen_scatter2) only once in the beginning of the file.

Next goes the order information as # order w, where w is the order value.

Below the order directive go two types of the data sets: first — Branch, second — Segment. Note that for order 0 there is no Branch data.

Also note two empty lines separating data sets (non-data statements, directives, may not be separated).

Number of columns in a data set corresponds to the number of codenames specified in the header for this type of scatter data.

For instance, the second segment of order 0 in the example above has radius 0.0670445 m and length 0.2 m, the first branch of order 1 has branching angle of 14.8347 deg and its first segment radius 0.0400599 (note that the first segment of order 1 has the same radius as it composes the very first branch of this order).

See Notations and Native format processing.

Distance

The main distance function is dt_distance. During optimization optim_avg_distance is used to calculate average distances from multiple data sets when needed. The distance is described in Potapov et al. GigaScience. Consult help dt_distance for further reference.

Optimization algorithm

The user is able to control the behavior of the genetic algorithm (GA) using the configuration options starting with ga_. See help bf_process_input for more details. The GA options are described at the dedicated Matlab pages.