-
Notifications
You must be signed in to change notification settings - Fork 68
PTDT-3807: Add temporal audio annotation support #2013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
PTDT-3807: Add temporal audio annotation support #2013
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
classifications: Optional[ | ||
List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]] | ||
] = None | ||
feature_schema_id: Optional[Cuid] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is feature_schema_id
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
classifications: Optional[ | ||
List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]] | ||
] = None | ||
feature_schema_id: Optional[Cuid] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is feature_schema_id
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually use it. Removing it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]] | ||
] = None | ||
feature_schema_id: Optional[Cuid] = None | ||
extra: Dict[str, Any] = Field(default_factory=dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is extra
for? Do we use it anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually use it. Removing it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
yield NDObject.from_common(segments, label.data) | ||
|
||
@classmethod | ||
def _create_temporal_annotations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call it _create_temporal_classifications
or smth like this? To differentiate them from temporal annotations (video bbox, polylines, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DISCARD -- If you refereto L74-L78 in this file there is a function called create video annotations + it follows the established sturcture of this class. Based on this lmk if you think we should change it - imho keep it as is. Lmk if you have a strong opinion here ---
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to Classifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
dataRow: Dict[str, str] | ||
|
||
|
||
def create_temporal_ndjson_annotations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function probably should also be renamed to create_temporal_ndjson_classifications
or create_temporal_ndjson_classification_annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i think classification is better than annotation actually. Classification != Annotation, so we should use classification for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
5311011
to
9afd82d
Compare
Description
This PR introduces Audio Temporal Annotations - a new feature that enables precise time-based annotations for audio files in the Labelbox SDK. This includes support for temporal classification annotations with millisecond-level timing precision.
Motivation: Audio annotation workflows require precise timing control for applications like:
Context: This feature extends the existing audio annotation infrastructure to support temporal annotations, using a millisecond-based timing system that provides the precision needed for audio applications while maintaining compatibility with the existing NDJSON serialization format.
Type of change
All Submissions
New Feature Submissions
Changes to Core Features
Summary of Changes
New Audio Temporal Annotation Types
AudioClassificationAnnotation
: Time-based classifications (radio, checklist, text) for audio segmentsCore Infrastructure Updates
TemporalFrame
,AnnotationGroupManager
,ValueGrouper
, andHierarchyBuilder
componentstemporal.py
module with generic components that can be reused for video, audio, and other temporal annotation typesCode Architecture Improvements
Generic[TemporalAnnotation]
for compile-time type checkingframe_extractor
callable allows different annotation types to use the same processing logicoverlaps()
method and improved temporal containment logiccreate_audio_ndjson_annotations()
convenience functionTesting
test_v3_serialization.py
(attached at the bottom) that validates both structure and valuesDocumentation & Examples
audio.ipynb
with temporal annotation examplesdemo_audio_token_temporal.py
showing per-token temporal annotationsSerialization & Import Support
Key Features
Simple Text Classification
Radio/Checklist with Temporal Ranges
Nested Classifications (Arbitrary Depth)
Serialization to NDJSON
Ontology Setup
Technical Architecture
New Temporal Classification API
The SDK now provides a simplified, recursive temporal classification interface that handles audio, video, and other time-based media:
Core Classes (
temporal.py
)TemporalClassificationText
TemporalClassificationQuestion
(Radio/Checklist)TemporalClassificationAnswer
Key Design Principles
Serialization (
temporal.py
inserialization/ndjson/
)Main Entry Point:
Processing Functions:
_process_text_group()
: Handles text classifications, groups by text value_process_question_group()
: Handles radio/checklist, groups by answer name_process_nested_classifications()
: Recursively processes nested structures_filter_classifications_by_overlap()
: Assigns nested classifications based on frame overlap_frames_overlap()
: Checks if any frames overlap between two sets_is_frame_subset()
: Validates child frames are within parent framesFrame Assignment Logic
Parent-Child Relationship:
frames: List[Tuple[int, int]]
(multiple discontinuous ranges)Overlap-Based Assignment:
Usage Examples
demo script