This repository contains tools to calculate time budget analysis from drone videos of zebras and giraffes, using the KABR model to label behavior automatically.
Figure 1: Overview of the pipeline for KABR dataset preparation.
KABR tools requires that torch be installed.
The KABR tools used in this process can be installed with:
pip install torch torchvision
pip install git+https://github.com/Imageomics/kabr-tools
Notes:
- Refer to pytorch.org to install specific versions of torch/CUDA
- detectron2 requires Linux or MacOS.
- If building detectron2's wheel fails, check gcc & g++ ≥ 5.4 (run
gcc --version
andg++ --version
). - SlowFast's setup.py is outdated; our workaround is
pip install git+https://github.com/Imageomics/SlowFast@797a6f3ae81c49019d006296f1e0f84f431dc356
, which is included when installingkabr_tools
.
Each KABR tool can be run through the command line (as described below) or imported as a python module. They each have help information which can be accessed on the command line through <tool-name> -h
.
Please refer to our KABR Project Page for additional details on the dataset and original paper.
Figure 2: Clip of drone video containing Plains and Grevy's zebras, plus some impalas.
The drone videos for the KABR dataset were collected at the Mpala Research Centre in January 2023. The missions were flown manually, using a DJI 2S Air drone.
We collaborated with expert ecologists to ensure minimal disturbance to the animals. We launched the drone approximately 200 meters horizontally from the animals and at an altitude of 30-40 meters. We gradually approached the herd from the side by reducing the altitude and horizontal distance and monitoring the animals for signs of vigilance.
Note that the vigilance exhibited by wildlife varies widely by species, habitat, sex, and the level to which animals may be habituated to anthropogenic noise. So, we recommend that you tailor your approach to your particular species and setting.
Please refer to our papers for details on the data collection process:
- KABR: In-Situ Dataset for Kenyan Animal Behavior Recognition from Drone Videos
- A Framework for Autonomic Computing for In Situ Imageomics
- Integrating Biological Data into Autonomous Remote Sensing Systems for In Situ Imageomics: A Case Study for Kenyan Animal Behavior Sensing with Unmanned Aerial Vehicles (UAVs)
In order to automatically label the animal videos with behavior, we must first create mini-scenes of each individual animal captured in the frame, illustrated below.
See the Wiki CVAT User Guide and Data Management Tips for detailed instructions and recommendations.
Figure 3: A mini-scene is a sub-image cropped from the drone video footage centered on and surrounding a single animal. Mini-scenes simulate the camera as well-aligned with each animal in the frame, compensating for the drone's movement by focusing on just the animal and its immediate surroundings. The KABR dataset consists of mini-scenes and their frame-by-frame behavior annotation.
To create mini-scenes, we first must perform the detection step, by drawing bounding boxes around each animal in frame.
See data/mini_scenes on Hugging Face for example mini-scenes.
Figure 4: Simplified CVAT annotation tool interface
Upload your raw videos to CVAT and perform the detections by drawing bounding boxes manually. This can be quite consuming, but has the advantage of generating highly accurate tracks. Depending on the resolution of your raw video, you may encounter out of space issues with CVAT. You can use downgrade.sh to reduce the size of your videos.
You may use YOLO to automatically perform detection on your videos. Use the script below to convert YOLO detections to CVAT format.
detector2cvat: Detect objects with Ultralytics YOLO detections, apply SORT tracking and convert tracks to CVAT format.
detector2cvat --video path_to_videos --save path_to_save [--imshow]
Once you have your tracks generated, use them to create mini-scenes from your raw footage.
tracks_extractor: Extract mini-scenes from CVAT tracks.
tracks_extractor --video path_to_videos --annotation path_to_annotations [--tracking] [--imshow]
You can use the KABR model on Hugging Face to label the mini-scenes with behavior. See the ethogram folder for the list of behaviors used to label the zebra videos.
To use the KABR model, download checkpoint_epoch_00075.pyth.zip
from Hugging Face, unzip checkpoint_epoch_00075.pyth
, and install SlowFast. Then run miniscene2behavior.py.
Label the mini-scenes:
miniscene2behavior [--config path_to_config] --checkpoint path_to_checkpoint [--gpu_num number_of_gpus] --miniscene path_to_miniscene [--output path_to_output_csv]
Notes:
- If the config hasn't been extracted yet, the script will write it to
config
. checkpoint
should be the path tocheckpoint_epoch_00075.pyth
.- If
gpu_num
is 0, the model will use CPU. Using at least 1 GPU greatly increases inference speed. If you're using OSC, you can request a node with one GPU by runningsbatch -N 1 --gpus-per-node 1 -A [account] --time=[minutes] [bash script]
. - mini-scenes are clipped videos focused on individual animals and video is the raw video file from which mini-scenes have been extracted.
See these csv files in Hugging Face for examples of annotated mini-scene outputs.
See the time budgets notebook for the code to create these visualizations.
Figure 5: Example flight path and video clip from KABR datasetL, 2 male Grevy's zebras observed for 10 minutes on 01/18/23.
Figure 6: Overall time budget for duration of 10 minute observation
Figure 7: Gantt chart for each zebra (3 minute duration)
If you wish to use YOLO to automatically generate detections, you may want to fine-tune your YOLO model for your dataset using the train_yolo notebook.
cvat2ultralytics: Convert CVAT annotations to Ultralytics YOLO dataset.
cvat2ultralytics --video path_to_videos --annotation path_to_annotations --dataset dataset_name [--skip skip_frames]
player: Player for tracking and behavior observation.
player --folder path_to_folder [--save] [--imshow]
Figure 7: Example player.py output.
cvat2slowfast: Convert CVAT annotations to the dataset in Charades format.
cvat2slowfast --miniscene path_to_mini_scenes --dataset dataset_name --classes path_to_classes_json [--old2new path_to_old2new_json] [--no_images]