-
Notifications
You must be signed in to change notification settings - Fork 32
General Dataset
In order to ease the burden of reading in datasets of different formats, our framework enforces a certain structure on the directory structure in which the dataset is provided. Moreover, each pointcloud is stored in a dedicated npy-file.
Pointclouds have to be provided in npy-files with the following structure:
x|y|z|(r|g|b)|(i)|(c)
.
The xyz-coordinates always have to be present followed by optional rgb color information,
intensity value as well as the class label. Even if these features are optional, the ordering of them has to be assured.
The rgb color information is assumed to be in range [0,255]
and the intensity values are inbetween [-2048,+2048]
and are normalized to a range of [-1,+1]
. For training a specified class label is a must. However, testing on unlabeled
datasets is also possible by omitting this information.
If this default normalization does not fulfil the needs of the datasets, all parameters can be overriden using adaptions
of the configuration file or deriving the dataset class as explained below.
The general dataset class is capable of handling k-fold datasets as well as simple train/test datasets.
For every subset of the dataset (e.g. one set of a k-fold dataset) an own directory is created. SET_1
in the example
below is just a placeholder and can be replaced by an arbitrary but unique identifier for that specific subset.
In this directory subfolder are placed whereas full_size
is a mandatory directory. Here, you will find the
pointclouds in their unaltered resolution needed for the final evaluation of the predictor. In order to increase the
training speed, our approaches often operate on downsampled pointclouds which are placed in the
directory as subdirectories (e.g. sample_0.1
for a downsampling such that there is one accumulated point in a
10cm by 10cm region).
In each of this directories the npy-files containing the pointclouds are placed with their names matching to their
downsampled versions.
dataset-directory
|
└───SET_1
│ │
│ └───full_size
│ │ pointcloud_01.npy
│ │ pointcloud_02.npy
│ │ ...
│ └───sample_0.1
│ │ pointcloud_01.npy
│ │ pointcloud_02.npy
│ │ ...
│ └───sample_0.3
│ │ pointcloud_01.npy
│ │ pointcloud_02.npy
│ │ ...
│
└───SET_2
│
└───full_size
│ pointcloud_52.npy
│ pointcloud_53.npy
│ ...
└───sample_0.1
│ pointcloud_52.npy
│ pointcloud_53.npy
│ ...
└───sample_0.3
│ pointcloud_52.npy
│ pointcloud_53.npy
│ ...
As explained in ... the configuration file is the key element of the framework, where all configurations are made and parameters are set to start the training/testing of the network. A section has to be dedicated for integrating the dataset into the pipeline, as shown below:
dataset:
name: GeneralDataset # It is possible to define your own datasets (e.g. derive from GeneralDataset)
num_classes: 13 # number of unique class labels for the dataset
data_path: /path/to/the/dataset # specifiy the path to the root directory of the dataset
test_sets: ['SET-1'] # list of subsets used as a validation set
downsample_prefix: sample_0.1 # subfolder with downsampled pointclouds (if not specified: train on full_size)
colors: True # Is color information included?
laser: False # Is laser intensity information included?