-
Notifications
You must be signed in to change notification settings - Fork 68
PTDT-3807: Add temporal audio annotation support #2013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rishisurana-labelbox
wants to merge
19
commits into
develop
Choose a base branch
from
rishi/ptdt-3807/temporal-audio-support-sdk
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
e4fd630
chore: PoC + ipynb
rishisurana-labelbox dbcc7bf
chore: use ms instead of s in sdk interface
rishisurana-labelbox dbb592f
:art: Cleaned
github-actions[bot] ff298d4
:memo: README updated
github-actions[bot] 16896fd
chore: it works for temporal text/radio/checklist classifications
rishisurana-labelbox 7a666cc
chore: clean up and organize code
rishisurana-labelbox ac58ad0
chore: update tests fail and documentation update
rishisurana-labelbox 67dd14a
:art: Cleaned
github-actions[bot] a1600e5
:memo: README updated
github-actions[bot] b4d2f42
chore: improve imports
rishisurana-labelbox fadb14e
chore: restore py version
rishisurana-labelbox 1e12596
chore: restore py version
rishisurana-labelbox c2a7b4c
chore: cleanup
rishisurana-labelbox 26a35fd
chore: lint
rishisurana-labelbox b16f2ea
fix: failing build issue due to lint
rishisurana-labelbox 943cb73
chore: simplify
rishisurana-labelbox a838513
chore: update examples - all tests passing
rishisurana-labelbox fe950be
:art: Cleaned
github-actions[bot] 0b9085d
:memo: README updated
github-actions[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
from typing import Optional | ||
|
||
from labelbox.data.annotation_types.annotation import ( | ||
ClassificationAnnotation, | ||
ObjectAnnotation, | ||
) | ||
from labelbox.data.mixins import ( | ||
ConfidenceNotSupportedMixin, | ||
CustomMetricsNotSupportedMixin, | ||
) | ||
|
||
|
||
class AudioClassificationAnnotation(ClassificationAnnotation): | ||
"""Audio classification for specific time range | ||
|
||
Examples: | ||
- Speaker identification from 2500ms to 4100ms | ||
- Audio quality assessment for a segment | ||
- Language detection for audio segments | ||
|
||
Args: | ||
name (Optional[str]): Name of the classification | ||
feature_schema_id (Optional[Cuid]): Feature schema identifier | ||
value (Union[Text, Checklist, Radio]): Classification value | ||
frame (int): The frame index in milliseconds (e.g., 2500 = 2.5 seconds) | ||
end_frame (Optional[int]): End frame in milliseconds (for time ranges) | ||
segment_index (Optional[int]): Index of audio segment this annotation belongs to | ||
extra (Dict[str, Any]): Additional metadata | ||
""" | ||
|
||
frame: int | ||
end_frame: Optional[int] = None | ||
segment_index: Optional[int] = None | ||
|
||
|
||
class AudioObjectAnnotation( | ||
ObjectAnnotation, | ||
ConfidenceNotSupportedMixin, | ||
CustomMetricsNotSupportedMixin, | ||
): | ||
"""Audio object annotation for specific time range | ||
|
||
Examples: | ||
- Transcription: "Hello world" from 2500ms to 4100ms | ||
- Sound events: "Dog barking" from 10000ms to 12000ms | ||
- Audio segments with metadata | ||
|
||
Args: | ||
name (Optional[str]): Name of the annotation | ||
feature_schema_id (Optional[Cuid]): Feature schema identifier | ||
value (Union[TextEntity, Geometry]): Localization or text content | ||
frame (int): The frame index in milliseconds (e.g., 10000 = 10.0 seconds) | ||
end_frame (Optional[int]): End frame in milliseconds (for time ranges) | ||
keyframe (bool): Whether this is a keyframe annotation (default: True) | ||
segment_index (Optional[int]): Index of audio segment this annotation belongs to | ||
classifications (Optional[List[ClassificationAnnotation]]): Optional sub-classifications | ||
extra (Dict[str, Any]): Additional metadata | ||
""" | ||
|
||
frame: int | ||
end_frame: Optional[int] = None | ||
keyframe: bool = True | ||
segment_index: Optional[int] = None |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Temporal Annotation Classification Fails
The
NDClassification.to_common
method uses a fragile heuristic to distinguish between audio and video temporal annotations. It checks for"frames"
inannotation.extra
, but both annotation types can contain frame data there. This unreliable check can lead to incorrect classification and downstream processing errors.