-
Notifications
You must be signed in to change notification settings - Fork 68
PTDT-3807: Add temporal audio annotation support #2013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PTDT-3807: Add temporal audio annotation support #2013
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
| classifications: Optional[ | ||
| List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]] | ||
| ] = None | ||
| feature_schema_id: Optional[Cuid] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is feature_schema_id for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| classifications: Optional[ | ||
| List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]] | ||
| ] = None | ||
| feature_schema_id: Optional[Cuid] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is feature_schema_id for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually use it. Removing it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]] | ||
| ] = None | ||
| feature_schema_id: Optional[Cuid] = None | ||
| extra: Dict[str, Any] = Field(default_factory=dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is extra for? Do we use it anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually use it. Removing it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| yield NDObject.from_common(segments, label.data) | ||
|
|
||
| @classmethod | ||
| def _create_temporal_annotations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call it _create_temporal_classifications or smth like this? To differentiate them from temporal annotations (video bbox, polylines, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DISCARD -- If you refereto L74-L78 in this file there is a function called create video annotations + it follows the established sturcture of this class. Based on this lmk if you think we should change it - imho keep it as is. Lmk if you have a strong opinion here ---
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to Classifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| dataRow: Dict[str, str] | ||
|
|
||
|
|
||
| def create_temporal_ndjson_annotations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function probably should also be renamed to create_temporal_ndjson_classifications or create_temporal_ndjson_classification_annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i think classification is better than annotation actually. Classification != Annotation, so we should use classification for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
5311011 to
9afd82d
Compare
kvilon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PAT & LGTM
Description
This PR introduces Audio Temporal Annotations - a new feature that enables precise time-based annotations for audio files in the Labelbox SDK. This includes support for temporal classification annotations with millisecond-level timing precision.
Motivation: Audio annotation workflows require precise timing control for applications like:
Context: This feature extends the existing audio annotation infrastructure to support temporal annotations, using a millisecond-based timing system that provides the precision needed for audio applications while maintaining compatibility with the existing NDJSON serialization format.
Type of change
All Submissions
New Feature Submissions
Changes to Core Features
Summary of Changes
New Audio Temporal Annotation Types
AudioClassificationAnnotation: Time-based classifications (radio, checklist, text) for audio segmentsCore Infrastructure Updates
TemporalFrame,AnnotationGroupManager,ValueGrouper, andHierarchyBuildercomponentstemporal.pymodule with generic components that can be reused for video, audio, and other temporal annotation typesCode Architecture Improvements
Generic[TemporalAnnotation]for compile-time type checkingframe_extractorcallable allows different annotation types to use the same processing logicoverlaps()method and improved temporal containment logiccreate_audio_ndjson_annotations()convenience functionTesting
test_v3_serialization.py(attached at the bottom) that validates both structure and valuesDocumentation & Examples
audio.ipynbwith temporal annotation examplesdemo_audio_token_temporal.pyshowing per-token temporal annotationsSerialization & Import Support
Key Features
Simple Text Classification
Radio/Checklist with Temporal Ranges
Nested Classifications (Arbitrary Depth)
Serialization to NDJSON
Ontology Setup
Technical Architecture
New Temporal Classification API
The SDK now provides a simplified, recursive temporal classification interface that handles audio, video, and other time-based media:
Core Classes (
temporal.py)TemporalClassificationTextTemporalClassificationQuestion(Radio/Checklist)TemporalClassificationAnswerKey Design Principles
Serialization (
temporal.pyinserialization/ndjson/)Main Entry Point:
Processing Functions:
_process_text_group(): Handles text classifications, groups by text value_process_question_group(): Handles radio/checklist, groups by answer name_process_nested_classifications(): Recursively processes nested structures_filter_classifications_by_overlap(): Assigns nested classifications based on frame overlap_frames_overlap(): Checks if any frames overlap between two sets_is_frame_subset(): Validates child frames are within parent framesFrame Assignment Logic
Parent-Child Relationship:
frames: List[Tuple[int, int]](multiple discontinuous ranges)Overlap-Based Assignment:
Usage Examples
demo script