The information below is meant to guide discussion and provide pointers and considerations that a professional annotation team would have for a person detection annotation project.
- Only visible parts of people should be annotated.
- Don't include bags, purses, baby carriages, shopping carts, etc. into a bounding box.
- Individuals should have the same identity if he/she/it/they appear/disappear several times throughout a video.
- Don't annotate small or really blurry people.
CVAT
supports multiple annotation formats that may be found here.
- Annotation Format is to be chosen by the annotator. One example is
CVAT
XML file schema/metadata.
More information on the XML annotation format specifically may be found here. The link describes the tags that are present in the XML, what they mean, and demonstrates an annotation example using annotation boxes, polygons, etc.
The annotation file should contain the following information per frame (from either manual or interpolated annotations):
Annotation | Annotation Type | Encoded by |
---|---|---|
Person (location) | Rectangular bounding box (x1, y1, x2, y2) | x1: horizontal coordinate of the top left corner y1: vertical coordinate of the top left corner x2: horizontal coordinate of the bottom right corner y2: vertical coordinate of the bottom right corner |
Identity | Number | Number indicating the person's identity (maintained over time). |
Occlusion | Number | Value: |