SPREAD is a synthetic dataset for image-based visual tasks in forestry. The currently supported tasks include tree detection, tree/trunk segmentation, canopy segmentation, tree species recognition, and image-based estimation of key tree parameters (DBH, tree height, canopy diameter). SPREAD includes RGB, depth images, segmentation maps (instance + semantic), point clouds, key tree parameters (DBH, height, and canopy diameter), and metadata for each tree in the scene (species, location, size, etc.). The dataset is collected from 13 different photo-realistic virtual scenes generated by Unreal Engine 5, including forests (6 different biomes), urban areas (4 different scenes), and plantations. Data collection heavily relies on a simulator called Colosseum and our designed blueprint program. We have open-sourced the entire pipeline used to collect SPREAD, allowing researchers to adapt this framework to create datasets more suitable for their research.
- Synthetic Photo-Realistic Arboreal Dataset (SPREAD)
One of the primary motivations for creating SPREAD was to address the scarcity of annotated forest imagery and forest inventory data in the real world. therefore, we leveraged unreal engine 5 to create highly realistic virtual scenes that closely resemble real-world environments, where we collected accurately annotated images and precise tree parameters. we considered tree distribution, background context, and dataset application scenarios to construct three types of environments: forests, urban areas, and plantations.
For the forest environment, SPREAD currently includes six different forest scenes, each representing a distinct biome: tropical rainforest, redwood forest, birch forest, burned forest, meadow forest, and deciduous forest. for urban environments, we considered elements that could interfere with tree detection and segmentation, such as utility poles, fire hydrants, and complex backgrounds. we built four urban scenes: two distinct downtown areas, a suburban area, and an urban park. additionally, we developed a plantation scene where fruit trees are neatly arranged, growing uniformly in the same plot.
In terms of image modalities, SPREAD includes RGB, depth maps, semantic segmentation maps, and instance segmentation maps. at the same time, beyond near-ground samples, SPREAD also includes drone-view images, thereby supporting canopy segmentation tasks. some example images are shown below.
SPREAD contains approximately 37,000 ground samples and 19,000 drone-view samples. each sample includes RGB images, depth maps, segmentation maps, point clouds, metadata, and parameters for all trees within the field of view (tree ID, location, DBH, height, and canopy diameter). basic information about each scene and the distribution of key tree parameters are shown in the diagram below (b). all samples were collected under up to 11 different weather conditions, with weather distribution shown in the diagram below (c).
To identify the trees within a given image and retrieve their corresponding parameters, please refer to the figure below. As an example from the broadleaf dataset, the following key files can be found in the dataset uploaded to Zenodo:
- rgb/Tree0_1720149451.png
- instance_segmentation/Tree0_1720149451.png
- instance_segmentation/Tree0_1720149451.txt
- color_palette.xlsx
- obj_info_final.xlsx
To extract detailed information about all the trees in a given RGB image, follow these four steps illustrated in the figure above:
- Step 1. Extract all the RGB values from the instance segmentation map.
- Step 2. Cross-reference these RGB values with color_palette.xlsx to identify the corresponding color index.
- Step 3. Use the color index to find the tree IDs in the corresponding metadata file (.txt) under instance_segmentation folder.
- Step 4. Input the obtained tree IDs into obj_info_final.xlsx to retrieve the trees’ location information and parameters.
SPREAD was collected using a highly scalable and customizable data collection framework, which you can utilize in your custom game levels to collect RGB, depth maps, and segmentation maps. the following steps require some familiarity with UE5, blueprints, and Python.
You can find more amazing, beautifully crafted environment asset packs in the Unreal Marketplace. you can fine-tune the demo levels of these asset packs to create a new level that can use the SPREAD data collection framework. if you want to minimally modify the data collection framework, the main game level (assumed to be named Main_Map) should meet the following criteria:
- The level must include a landscape, instanced foliage actor (rocks, shrubs, etc.), static mesh actors (trees), and ultra dynamic sky and weather (an amazing weather plugin).
- We emphasize that every tree in the game level must be a static mesh actor. trees generated procedurally in UE5 are often represented as instanced foliage actors, in which case, you need to convert these trees into static mesh actors. you may find the MultiTool plugin very useful for such conversions.
- The trees in the level must be named starting with "Tree" and placed together in a folder named "Tree." we recommend naming trees as Tree0, Tree1, Tree2, etc. for batch renaming, you can use the Multi Objects Renaming Tool plugin.
In addition, you need to create a level that only contains a landscape (assumed to be named: Landscape_Map), which will be used later to obtain ground point information, determining the camera height when capturing images.
If obtaining the plugins or asset packs mentioned above is challenging, you can refer to XX for code or blueprint modifications, annotating or removing missing components' code or blueprints.
We recommend referring to the detailed documentation of AirSim (the predecessor of Colosseum) to configure Colosseum. when you can successfully run the level in Main_Map with the AirSimGameMode, it indicates that you have successfully configured Colosseum.
- Step 1: Place the BP_FunctionKit.uasset, Trunk_Highlighter.uasset, LandscapeSampler.uasset files from the UE_Assets folder into your current UE project.
- Step 2: Drag the BP_FunctionKit into the Main_Map, then double-click to view its blueprint graph and ensure that all nodes in the blueprint are displayed correctly.
- Step 3: Open the level blueprint of Main_Map, copy the level blueprint from this URL into your level blueprint. check if all nodes in the level blueprint are displayed correctly. if not, you need to manually connect these nodes. this completes the configuration of Main_Map.
- Step 1: Copy the landscape from Main_Map into Landscape_Map and ensure that the landscape's name is "LandScape."
- Step 2: Drag the LandscapeSampler into Landscape_Map, then double-click it and set the sampling parameters and file saving path. by default, sampling is done every 30cm.
Review the asset details of each static mesh (tree) in the level individually, select the material index for the slots belonging to the leaf parts, and record them in the leaf_material_index.xlsx file in the format "SM_Name Leaf_Material_Index." if your trees have multiple leaf parts, separate each material index with a space.
- Step 1: Install all the required Python libraries
pip install -r requirements.txt
- Step 2: Modify the parameters in the
data_collection.py
file
# Modify these parameters
NUMBER_OF_SAMPLES = 3000 # Number of samples you want to collect
MIN_CAPTURE_DISTANCE= 1 # Minimum distance from the camera to the tree
MAX_CAPTURE_DISTANCE = 5 # Maximum distance from the camera to the tree
CAPTURE_HIEGHT = 2 # Camera height
MAX_YAW_DEVIATION = 20 # Maximum yaw deviation of the camera
MAX_PITCH_DEVIATION = 3 # Maximum pitch deviation of the camera
MAX_ROLL_DEVIATION = 3 # Maximum roll deviation of the camera
MAX_DISTANCE_TO_OBJECT = 200 # Objects beyond this distance will not be recorded in the segmentation map
- Step 1: Run the Landscape_Map level, which will generate a landscape_info.txt file at the path you set, containing ground point information.
- Step 2: Exit the Landscape_Map level, open the Main_Map level, and run the level in AirSimGameMode.
- Step 3: Run the data_collection.py script, which will generate RGB, depth maps, and segmentation maps at the path you set.
- Step 4 (optional): Run the generate_segmentation_product.ipynb script to generate semantic segmentation maps. alternatively, you can generate other segmentation products from the instance segmentation maps according to your needs.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.
Our Group Link: Energy and Environment Group