Skip to content

General Dataset

Jonas Schult edited this page Oct 5, 2018 · 1 revision

General Dataset

In order to ease the burden of reading in datasets of different formats, our framework enforces a certain structure on the directory structure in which the dataset is provided. Moreover, each pointcloud is stored in a dedicated npy-file.

Numpy File Format

Pointclouds have to be provided in npy-files with the following structure: x|y|z|(r|g|b)|(i)|(c). The xyz-coordinates always have to be present followed by optional rgb color information, intensity value as well as the class label. Even if these features are optional, the ordering of them has to be assured. The rgb color information is assumed to be in range [0,255] and the intensity values are inbetween [-2048,+2048] and are normalized to a range of [-1,+1]. For training a specified class label is a must. However, testing on unlabeled datasets is also possible by omitting this information. If this default normalization does not fulfil the needs of the datasets, all parameters can be overriden using adaptions of the configuration file or deriving the dataset class as explained below.

Directory Structure

The general dataset class is capable of handling k-fold datasets as well as simple train/test datasets. For every subset of the dataset (e.g. one set of a k-fold dataset) an own directory is created. SET_1 in the example below is just a placeholder and can be replaced by an arbitrary but unique identifier for that specific subset. In this directory subfolder are placed whereas full_size is a mandatory directory. Here, you will find the pointclouds in their unaltered resolution needed for the final evaluation of the predictor. In order to increase the training speed, our approaches often operate on downsampled pointclouds which are placed in the directory as subdirectories (e.g. sample_0.1 for a downsampling such that there is one accumulated point in a 10cm by 10cm region). In each of this directories the npy-files containing the pointclouds are placed with their names matching to their downsampled versions.

dataset-directory
|
└───SET_1
│   │
│   └───full_size
│   │       pointcloud_01.npy
│   │       pointcloud_02.npy
│   │       ...
│   └───sample_0.1
│   │       pointcloud_01.npy
│   │       pointcloud_02.npy
│   │       ...
│   └───sample_0.3
│       │   pointcloud_01.npy
│       │   pointcloud_02.npy
│       │   ...
│   
└───SET_2
    │
    └───full_size
    │       pointcloud_52.npy
    │       pointcloud_53.npy
    │       ...
    └───sample_0.1
    │       pointcloud_52.npy
    │       pointcloud_53.npy
    │       ...
    └───sample_0.3
        │   pointcloud_52.npy
        │   pointcloud_53.npy
        │   ...

Configuration File Integration

As explained in ... the configuration file is the key element of the framework, where all configurations are made and parameters are set to start the training/testing of the network. A section has to be dedicated for integrating the dataset into the pipeline, as shown below:

dataset:
  name: GeneralDataset # It is possible to define your own datasets (e.g. derive from GeneralDataset)
  num_classes:  13 # number of unique class labels for the dataset
  data_path: /path/to/the/dataset # specifiy the path to the root directory of the dataset
  test_sets: ['SET-1'] # list of subsets used as a validation set
  downsample_prefix: sample_0.1 # subfolder with downsampled pointclouds (if not specified: train on full_size)
  colors: True # Is color information included?
  laser: False # Is laser intensity information included?
Clone this wiki locally