PTDT-3807: Add temporal audio annotation support #2013

rishisurana-labelbox · 2025-09-08T17:52:00Z

Description

This PR introduces Audio Temporal Annotations - a new feature that enables precise time-based annotations for audio files in the Labelbox SDK. This includes support for temporal classification annotations with millisecond-level timing precision.

Motivation: Audio annotation workflows require precise timing control for applications like:

Podcast transcription with speaker identification
Call center quality analysis with word-level annotations
Music analysis with temporal classifications
Sound event detection with precise timestamps

Context: This feature extends the existing audio annotation infrastructure to support temporal annotations, using a millisecond-based timing system that provides the precision needed for audio applications while maintaining compatibility with the existing NDJSON serialization format.

Type of change

New feature (non-breaking change which adds functionality)
Document change (fix typo or modifying any markdown files, code comments or anything in the examples folder only)

All Submissions

Have you followed the guidelines in our Contributing document?
Have you provided a description?
Are your changes properly formatted?

New Feature Submissions

Does your submission pass tests?
Have you added thorough tests for your new feature?
Have you commented your code, particularly in hard-to-understand areas?
Have you added a Docstring?

Changes to Core Features

Have you written new tests for your core changes, as applicable?
Have you successfully run tests with your changes locally?
Have you updated any code comments, as applicable?

Summary of Changes

New Audio Temporal Annotation Types

AudioClassificationAnnotation: Time-based classifications (radio, checklist, text) for audio segments
Millisecond-based timing: Direct millisecond input for precise timing control
INDEX scope support: Temporal classifications use INDEX scope for frame-based annotations

Core Infrastructure Updates

Generic temporal processing: Refactored audio-specific logic into reusable TemporalFrame, AnnotationGroupManager, ValueGrouper, and HierarchyBuilder components
Modular architecture: Created temporal.py module with generic components that can be reused for video, audio, and other temporal annotation types
Frame-based organization: Temporal annotations organized by millisecond frames for efficient processing
MAL compatibility: Audio temporal annotations work with Model-Assisted Labeling pipeline

Code Architecture Improvements

Separation of concerns: Extracted complex nested logic into focused, single-purpose components
Type safety: Generic components with Generic[TemporalAnnotation] for compile-time type checking
Configurable frame extraction: frame_extractor callable allows different annotation types to use the same processing logic
Enhanced frame operations: Added overlaps() method and improved temporal containment logic
Backward compatibility: Audio usage remains unchanged via create_audio_ndjson_annotations() convenience function

Testing

Comprehensive serialization test scripts: Added test_v3_serialization.py(attached at the bottom) that validates both structure and values
Updated test cases: Enhanced test coverage for audio temporal annotation functionality
Integration tests: Audio temporal annotations work with existing import/export pipelines
Edge case testing: Precision testing for millisecond timing and mixed annotation types
Value validation: Tests verify that all annotation values and frame ranges are preserved correctly

Documentation & Examples

Updated example notebook: Enhanced audio.ipynb with temporal annotation examples
Demo script: Added demo_audio_token_temporal.py showing per-token temporal annotations
Use case examples: Word-level speaker identification and temporal classifications
Best practices: Guidelines for ontology setup with INDEX scope

Serialization & Import Support

NDJSON format: Audio temporal annotations serialize to standard NDJSON format with hierarchical structure
Import pipeline: Full support for audio temporal annotation imports via MAL and Label Import
Frame metadata: Millisecond timing preserved in serialized format
Backward compatibility: Existing audio annotation workflows unchanged
Nested classification support: Complex hierarchical temporal classifications with proper containment logic

Key Features

Simple Text Classification

from labelbox.data.annotation_types.temporal import TemporalClassificationText

# Multiple text values at different time ranges
transcription = TemporalClassificationText(
    name="transcription",
    value=[
        (1000, 1100, "Hello"),                      # 1.0s - 1.1s
        (1500, 2400, "How can I help you today?"),  # 1.5s - 2.4s
        (2500, 2700, "Thank you"),                  # 2.5s - 2.7s
    ]
)

Radio/Checklist with Temporal Ranges

from labelbox.data.annotation_types.temporal import (
    TemporalClassificationQuestion,
    TemporalClassificationAnswer
)

# Radio: single answer with discontinuous time ranges
speaker = TemporalClassificationQuestion(
    name="speaker",
    value=[
        TemporalClassificationAnswer(
            name="user",
            frames=[(200, 1500), (2000, 2500)]  # User speaks in 2 segments
        )
    ]
)

# Checklist: multiple answers with their own time ranges
audio_quality = TemporalClassificationQuestion(
    name="audio_quality",
    value=[
        TemporalClassificationAnswer(
            name="background_noise",
            frames=[(300, 800), (1200, 1800)]
        ),
        TemporalClassificationAnswer(
            name="echo",
            frames=[(2200, 2900)]
        )
    ]
)

Nested Classifications (Arbitrary Depth)

# Text → Text → Text (3 levels deep)
transcription_with_notes = TemporalClassificationText(
    name="transcription",
    value=[(1500, 2400, "How can I help you today?")],
    classifications=[
        TemporalClassificationText(
            name="speaker_notes",
            value=[(1600, 2000, "Polite greeting")],
            classifications=[
                TemporalClassificationText(
                    name="context_tags",
                    value=[(1800, 2000, "customer service tone")]
                )
            ]
        )
    ]
)

# Radio → Radio → Radio (3 levels deep)
speaker_with_tone = TemporalClassificationQuestion(
    name="speaker",
    value=[
        TemporalClassificationAnswer(
            name="user",
            frames=[(200, 1600)],
            classifications=[
                TemporalClassificationQuestion(
                    name="tone",
                    value=[
                        TemporalClassificationAnswer(
                            name="professional",
                            frames=[(1000, 1600)],
                            classifications=[
                                TemporalClassificationQuestion(
                                    name="clarity",
                                    value=[
                                        TemporalClassificationAnswer(
                                            name="clear",
                                            frames=[(1300, 1500)]
                                        )
                                    ]
                                )
                            ]
                        )
                    ]
                )
            ]
        )
    ]
)

Serialization to NDJSON

from labelbox.data.serialization.ndjson.temporal import create_temporal_ndjson_classifications

annotations = [transcription, speaker, audio_quality]

# Convert to NDJSON format for MAL upload
ndjson_predictions = create_temporal_ndjson_classifications(
    annotations=annotations,
    data_global_key="audio_file.mp3"
)

# Upload predictions
predictions_for_upload = [
    pred.model_dump(by_alias=True, exclude_none=True)
    for pred in ndjson_predictions
]

Ontology Setup

# INDEX scope required for temporal classifications
ontology_builder = lb.OntologyBuilder(classifications=[
    lb.Classification(
        class_type=lb.Classification.Type.TEXT,
        name="transcription",
        scope=lb.Classification.Scope.INDEX,  # INDEX for temporal
        options=[
            lb.Classification(
                class_type=lb.Classification.Type.TEXT,
                name="speaker_notes",
            )
        ]
    ),
    lb.Classification(
        class_type=lb.Classification.Type.RADIO,
        name="speaker",
        scope=lb.Classification.Scope.INDEX,
        options=[
            lb.Option("user", options=[
                lb.Classification(
                    class_type=lb.Classification.Type.RADIO,
                    name="tone",
                    options=[lb.Option("professional"), lb.Option("casual")]
                )
            ]),
            lb.Option("assistant")
        ]
    ),
])

Technical Architecture

New Temporal Classification API

The SDK now provides a simplified, recursive temporal classification interface that handles audio, video, and other time-based media:

Core Classes (`temporal.py`)

TemporalClassificationText

class TemporalClassificationText(BaseModel):
    """Text classification with multiple temporal values."""
    name: str
    value: List[Tuple[int, int, str]]  # [(start_ms, end_ms, text_value), ...]
    classifications: Optional[List[Union[
        TemporalClassificationText,
        TemporalClassificationQuestion
    ]]] = None

TemporalClassificationQuestion (Radio/Checklist)

class TemporalClassificationQuestion(BaseModel):
    """Radio or Checklist question with temporal answers."""
    name: str
    value: List[TemporalClassificationAnswer]  # Radio: 1 answer, Checklist: many
    classifications: Optional[List[Union[
        TemporalClassificationText,
        TemporalClassificationQuestion
    ]]] = None

TemporalClassificationAnswer

class TemporalClassificationAnswer(BaseModel):
    """Answer option with discontinuous frame ranges."""
    name: str
    frames: List[Tuple[int, int]] = []  # [(start_ms, end_ms), ...]
    classifications: Optional[List[Union[
        TemporalClassificationText,
        TemporalClassificationQuestion
    ]]] = None

Key Design Principles

Recursive Structure: All three classes support arbitrary nesting depth
Frame Validation: Nested classifications must have frames within parent frames
Discontinuous Ranges: Answers can span multiple non-contiguous time segments
Type Agnostic: Text, Radio, and Checklist use the same nested structure

Serialization (`temporal.py` in `serialization/ndjson/`)

Main Entry Point:

def create_temporal_ndjson_classifications(
    annotations: List[Union[TemporalClassificationText, TemporalClassificationQuestion]],
    data_global_key: str,
) -> List[TemporalNDJSON]:
    """
    Converts temporal classifications to NDJSON format.

    - Groups annotations by name
    - Validates frame containment
    - Merges discontinuous ranges
    - Recursively processes nested classifications
    """

Processing Functions:

_process_text_group(): Handles text classifications, groups by text value
_process_question_group(): Handles radio/checklist, groups by answer name
_process_nested_classifications(): Recursively processes nested structures
_filter_classifications_by_overlap(): Assigns nested classifications based on frame overlap
_frames_overlap(): Checks if any frames overlap between two sets
_is_frame_subset(): Validates child frames are within parent frames

Frame Assignment Logic

Parent-Child Relationship:

Nested Tree uses frames: List[Tuple[int, int]] (multiple discontinuous ranges)
Nested frames MUST be subsets of parent frames
Orphaned classifications (invalid frames) are logged as warnings and discarded

Overlap-Based Assignment:

Nested classifications are matched to parents by frame overlap
A nested classification is assigned if ANY of its frames overlap with ANY parent frame
Multiple text values/answers can share the same nested classifications if frames overlap

Usage Examples

demo script

#!/usr/bin/env python3

import labelbox as lb
import uuid
from labelbox.data.serialization.ndjson.label import NDLabel
import labelbox.types as lb_types
from labelbox.data.annotation_types.label import Label
import argparse

# Parse command line arguments
parser = argparse.ArgumentParser(description='Compare and optionally upload temporal audio classifications')
parser.add_argument('--upload-raw', action='store_true', help='Upload raw predictions from actual.py')
parser.add_argument('--upload-computed', action='store_true', help='Upload computed predictions from class-based API')
parser.add_argument('--env', type=str, choices=['local', 'prod'], default='local', help='Environment to upload to (local or prod)')
args = parser.parse_args()



asset = {
    "row_data": "https://storage.googleapis.com/lb-artifacts-testing-public/audio/gpt_how_can_you_help_me.wav",
    "global_key": "123",
    "media_type": "AUDIO",
}

# Step 2: Create ontology with temporal classifications (SAME AS NDJSON VERSION)
ontology_builder = lb.OntologyBuilder(
    tools=[
        # No tools needed for audio temporal - classifications handle everything
    ],
    classifications=[
        # Text with nested classifications
        lb.Classification(
            class_type=lb.Classification.Type.TEXT,
            name="text_class",
            scope=lb.Classification.Scope.INDEX,
            options=[
                lb.Classification(
                    class_type=lb.Classification.Type.TEXT,
                    name="text_text",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.RADIO,
                            name="text_text_radio",
                            options=[
                                lb.Option("text_text_radio_option_1"),
                                lb.Option("text_text_radio_option_2"),
                            ],
                        ),
                        lb.Classification(
                            class_type=lb.Classification.Type.TEXT,
                            name="text_text_text",
                        ),
                    ]
                ),
                lb.Classification(
                    class_type=lb.Classification.Type.RADIO,
                    name="text_radio",
                    options=[
                        lb.Option(
                            "text_radio_option_1",
                            options=[
                                lb.Classification(
                                    class_type=lb.Classification.Type.CHECKLIST,
                                    name="text_radio_checklist",
                                    options=[
                                        lb.Option("text_radio_checklist_option_1"),
                                        lb.Option("text_radio_checklist_option_2"),
                                        lb.Option("text_radio_checklist_option_3"),
                                    ]
                                )
                            ]
                        ),
                        lb.Option("text_radio_option_2"),
                    ]
                ),
                lb.Classification(
                    class_type=lb.Classification.Type.CHECKLIST,
                    name="text_checklist",
                    options=[
                        lb.Option(
                            "text_checklist_option_1",
                            options=[
                                lb.Classification(
                                    class_type=lb.Classification.Type.TEXT,
                                    name="text_checklist_text",
                                )
                            ]
                        ),
                        lb.Option("text_checklist_option_2"),
                        lb.Option("text_checklist_option_3"),
                    ]
                )
            ],
        ),

        # Radio with nested classifications
        lb.Classification(
            class_type=lb.Classification.Type.RADIO,
            name="radio_class",
            scope=lb.Classification.Scope.INDEX,
            options=[
                lb.Option(
                    "radio_class_option_1",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.TEXT,
                            name="radio_text",
                            options=[
                                lb.Classification(
                                    class_type=lb.Classification.Type.CHECKLIST,
                                    name="radio_text_checklist",
                                    options=[
                                        lb.Option("option_1_radio_text_checklist"),
                                        lb.Option("option_2_radio_text_checklist"),
                                        lb.Option("option_3_radio_text_checklist"),
                                    ],
                                )
                            ]
                        )
                    ],
                ),
                lb.Option(
                    "radio_class_option_2",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.RADIO,
                            name="radio_radio",
                            options=[
                                lb.Option(
                                    "radio_radio_option_1",
                                    options=[
                                        lb.Classification(
                                            class_type=lb.Classification.Type.TEXT,
                                            name="radio_radio_text",
                                        )
                                    ]
                                ),
                                lb.Option("radio_radio_option_2"),
                            ],
                        )
                    ]
                ),
                lb.Option(
                    "radio_class_option_3",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.CHECKLIST,
                            name="radio_checklist",
                            options=[
                                lb.Option(
                                    "radio_checklist_option_1",
                                    options=[
                                        lb.Classification(
                                            class_type=lb.Classification.Type.RADIO,
                                            name="radio_checklist_radio",
                                            options=[
                                                lb.Option("radio_checklist_radio_option_1"),
                                                lb.Option("radio_checklist_radio_option_2"),
                                            ],
                                        )
                                    ]
                                ),
                                lb.Option("radio_checklist_option_2"),
                            ],
                        )
                    ]
                ),
            ],
        ),

        # Checklist with nested classifications
        lb.Classification(
            class_type=lb.Classification.Type.CHECKLIST,
            name="checklist_class",
            scope=lb.Classification.Scope.INDEX,
            options=[
                lb.Option(
                    "checklist_class_option_1",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.TEXT,
                            name="checklist_text",
                            options=[
                                lb.Classification(
                                    class_type=lb.Classification.Type.TEXT,
                                    name="checklist_text_text"
                                )
                            ]
                        )
                    ],
                ),
                lb.Option(
                    "checklist_class_option_2",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.RADIO,
                            name="checklist_radio",
                            options=[
                                lb.Option(
                                    "option_1_checklist_radio",
                                    options=[
                                        lb.Classification(
                                            class_type=lb.Classification.Type.RADIO,
                                            name="checklist_radio_radio",
                                            options=[
                                                lb.Option("option_1_checklist_radio_radio"),
                                                lb.Option("option_2_checklist_radio_radio"),
                                            ],
                                        )
                                    ]
                                ),
                                lb.Option("option_2_checklist_radio"),
                            ],
                        )
                    ],
                ),
                lb.Option(
                    "checklist_class_option_3",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.CHECKLIST,
                            name="checklist_checklist",
                            options=[
                                lb.Option(
                                    "option_1_checklist_checklist",
                                    options=[
                                        lb.Classification(
                                            class_type=lb.Classification.Type.CHECKLIST,
                                            name="checklist_checklist_checklist",
                                            options=[
                                                lb.Option("option_1_checklist_checklist_checklist"),
                                                lb.Option("option_2_checklist_checklist_checklist"),
                                                lb.Option("option_3_checklist_checklist_checklist"),
                                            ]
                                        )
                                    ]
                                ),
                                lb.Option("option_2_checklist_checklist"),
                                lb.Option("option_3_checklist_checklist"),
                            ],
                        )
                    ],
                ),
            ],
        ),
    ],
)




# Step 5: Create temporal classifications using class-based API
# TEXT_CLASS: Text > Text/Radio/Checklist branches
text_class_annotation = lb_types.TemporalClassificationText(
    name="text_class",
    value=[
        # text_text branch with nested radio
        (100, 1200, "top text with text classification"),
        # text_checklist branch with nested checklist
        (1400, 2400, "top level text with checklist classification"),
        # text_radio branch with nested text
        (2600, 3500, "text with radio"),
    ],
    classifications=[
        # text_text nested classification
        lb_types.TemporalClassificationText(
            name="text_text",
            value=[
                (200, 1000, "nested text classification with nested radio"),
            ],
            classifications=[
                # text_text_radio nested radio
                lb_types.TemporalClassificationQuestion(
                    name="text_text_radio",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="text_text_radio_option_1",
                            frames=[(300, 500)],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="text_text_radio_option_2",
                            frames=[(501, 900)],
                        ),
                    ],
                ),
                # text_text_text nested text
                lb_types.TemporalClassificationText(
                    name="text_text_text",
                    value=[
                        (900, 1000, "text_text_text value"),
                    ],
                ),
            ],
        ),
        # text_checklist nested classification
        lb_types.TemporalClassificationQuestion(
            name="text_checklist",
            value=[
                lb_types.TemporalClassificationAnswer(
                    name="text_checklist_option_1",
                    frames=[(1500, 2000)],
                    classifications=[
                        lb_types.TemporalClassificationText(
                            name="text_checklist_text",
                            value=[
                                (1600, 2000, "text / checklist / option 1 / text classification value"),
                            ],
                        ),
                    ],
                ),
                lb_types.TemporalClassificationAnswer(
                    name="text_checklist_option_2",
                    frames=[(1800, 2200)],
                ),
            ],
        ),
        # text_radio nested classification
        lb_types.TemporalClassificationQuestion(
            name="text_radio",
            value=[
                lb_types.TemporalClassificationAnswer(
                    name="text_radio_option_1",
                    frames=[(2700, 3200)],
                    classifications=[
                        lb_types.TemporalClassificationQuestion(
                            name="text_radio_checklist",
                            value=[
                                lb_types.TemporalClassificationAnswer(
                                    name="text_radio_checklist_option_1",
                                    frames=[(2800, 3100)],
                                ),
                                lb_types.TemporalClassificationAnswer(
                                    name="text_radio_checklist_option_3",
                                    frames=[(2900, 3200)],
                                ),
                            ],
                        ),
                    ],
                ),
                lb_types.TemporalClassificationAnswer(
                    name="text_radio_option_2",
                    frames=[(3201, 3500)],
                ),
            ],
        ),
    ],
)

# RADIO_CLASS: Radio > Text/Radio/Checklist branches
radio_class_annotation = lb_types.TemporalClassificationQuestion(
    name="radio_class",
    value=[
        lb_types.TemporalClassificationAnswer(
            name="radio_class_option_1",
            frames=[
                (200, 700),
                (1401, 2500),
            ],
            classifications=[
                lb_types.TemporalClassificationText(
                    name="radio_text",
                    value=[
                        (300, 700, "radio_text value"),
                        (1401, 2500, "radio_text value"),
                    ],
                    classifications=[
                        lb_types.TemporalClassificationQuestion(
                            name="radio_text_checklist",
                            value=[
                                lb_types.TemporalClassificationAnswer(
                                    name="option_1_radio_text_checklist",
                                    frames=[(1401, 2099)],
                                ),
                                lb_types.TemporalClassificationAnswer(
                                    name="option_2_radio_text_checklist",
                                    frames=[(1600, 2299)],
                                ),
                                lb_types.TemporalClassificationAnswer(
                                    name="option_3_radio_text_checklist",
                                    frames=[(1900, 2500)],
                                ),
                            ],
                        ),
                    ],
                ),
            ],
        ),
        lb_types.TemporalClassificationAnswer(
            name="radio_class_option_2",
            frames=[(701, 1400)],
            classifications=[
                lb_types.TemporalClassificationQuestion(
                    name="radio_radio",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="radio_radio_option_1",
                            frames=[(701, 1199)],
                            classifications=[
                                lb_types.TemporalClassificationText(
                                    name="radio_radio_text",
                                    value=[
                                        (900, 1199, "radio_radio_text value"),
                                    ],
                                ),
                            ],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="radio_radio_option_2",
                            frames=[(1200, 1400)],
                        ),
                    ],
                ),
            ],
        ),
        lb_types.TemporalClassificationAnswer(
            name="radio_class_option_3",
            frames=[(2600, 3500)],
            classifications=[
                lb_types.TemporalClassificationQuestion(
                    name="radio_checklist",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="radio_checklist_option_1",
                            frames=[(2600, 3300)],
                            classifications=[
                                lb_types.TemporalClassificationQuestion(
                                    name="radio_checklist_radio",
                                    value=[
                                        lb_types.TemporalClassificationAnswer(
                                            name="radio_checklist_radio_option_1",
                                            frames=[(2600, 3300)],
                                        ),
                                    ],
                                ),
                            ],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="radio_checklist_option_2",
                            frames=[(3000, 3500)],
                        ),
                    ],
                ),
            ],
        ),
    ],
)

# CHECKLIST_CLASS: Checklist > Text/Radio/Checklist branches
checklist_class_annotation = lb_types.TemporalClassificationQuestion(
    name="checklist_class",
    value=[
        lb_types.TemporalClassificationAnswer(
            name="checklist_class_option_1",
            frames=[(200, 1899)],
            classifications=[
                lb_types.TemporalClassificationText(
                    name="checklist_text",
                    value=[
                        (200, 1899, "checklist_text value"),
                    ],
                    classifications=[
                        lb_types.TemporalClassificationText(
                            name="checklist_text_text",
                            value=[
                                (400, 1899, "checklist_text_text value"),
                            ],
                        ),
                    ],
                ),
            ],
        ),
        lb_types.TemporalClassificationAnswer(
            name="checklist_class_option_2",
            frames=[(900, 2500)],
            classifications=[
                lb_types.TemporalClassificationQuestion(
                    name="checklist_radio",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="option_1_checklist_radio",
                            frames=[(900, 1999)],
                            classifications=[
                                lb_types.TemporalClassificationQuestion(
                                    name="checklist_radio_radio",
                                    value=[
                                        lb_types.TemporalClassificationAnswer(
                                            name="option_1_checklist_radio_radio",
                                            frames=[(1100, 1999)],
                                        ),
                                    ],
                                ),
                            ],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="option_2_checklist_radio",
                            frames=[(2000, 2500)],
                        ),
                    ],
                ),
            ],
        ),
        lb_types.TemporalClassificationAnswer(
            name="checklist_class_option_3",
            frames=[(2600, 3500)],
            classifications=[
                lb_types.TemporalClassificationQuestion(
                    name="checklist_checklist",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="option_1_checklist_checklist",
                            frames=[(2600, 3300)],
                            classifications=[
                                lb_types.TemporalClassificationQuestion(
                                    name="checklist_checklist_checklist",
                                    value=[
                                        lb_types.TemporalClassificationAnswer(
                                            name="option_2_checklist_checklist_checklist",
                                            frames=[(2600, 2999)],
                                        ),
                                        lb_types.TemporalClassificationAnswer(
                                            name="option_3_checklist_checklist_checklist",
                                            frames=[(2800, 3300)],
                                        ),
                                    ],
                                ),
                            ],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="option_2_checklist_checklist",
                            frames=[(3000, 3500)],
                        ),
                    ],
                ),
            ],
        ),
    ],
)


# Create Label with all temporal annotations
label = Label(
    data={"global_key": asset["global_key"]},
    annotations=[
        text_class_annotation,
        radio_class_annotation,
        checklist_class_annotation,
    ],
)

ndjson_predictions = list(NDLabel.from_common([label]))

# TemporalNDJSON objects are returned directly (not wrapped in NDLabel.annotations)
# They already have the correct structure: name, answer, dataRow
predictions_for_upload = [pred.model_dump() for pred in ndjson_predictions]



if args.upload_raw or args.upload_computed:
    print("\n" + "="*80)
    print("STARTING MAL UPLOAD PROCESS")
    print("="*80)

    # API Configuration
    api_key_local = ""
    api_key_prod =  ""

    rest_endpoint_local = "http://localhost:3000/api/v1"
    endpoint_local = "http://localhost:8080/graphql"

    # Initialize client based on environment
    if args.env == 'prod':
        print(f"🌍 Using PROD environment")
        client = lb.Client(api_key=api_key_prod)
    else:
        print(f"🌍 Using LOCAL environment")
        client = lb.Client(
            api_key=api_key_local,
            endpoint=endpoint_local,
            rest_endpoint=rest_endpoint_local
        )

    # Generate unique global key for the asset
    global_key_upload = f"audio-token-demo-class-based-{str(uuid.uuid4())}"
    asset_upload = {
        "row_data": "https://storage.googleapis.com/lb-artifacts-testing-public/audio/gpt_how_can_you_help_me.wav",
        "global_key": global_key_upload,
        "media_type": "AUDIO",
    }

    print(f"\n📦 Creating dataset...")
    dataset = client.create_dataset(
        name=f"audio_token_demo_class_based_dataset_{str(uuid.uuid4())[:8]}",
        iam_integration=None
    )
    print(f"✓ Dataset created: {dataset.uid}")

    print(f"\n📤 Uploading data row with global_key: {global_key_upload}")
    task = dataset.create_data_rows([asset_upload])
    task.wait_till_done()
    print(f"✓ Data row uploaded")

    print(f"\n🏗️  Creating ontology...")
    ontology = client.create_ontology(
        f"Audio with permutations class-based {str(uuid.uuid4())[:8]}",
        ontology_builder.asdict(),
        media_type=lb.MediaType.Audio,
    )
    print(f"✓ Ontology created: {ontology.uid}")

    print(f"\n📋 Creating project...")
    project = client.create_project(
        name=f"Audio with permutations class-based {str(uuid.uuid4())[:8]}",
        media_type=lb.MediaType.Audio
    )
    print(f"✓ Project created: {project.uid}")

    print(f"\n🔗 Connecting ontology to project...")
    project.connect_ontology(ontology)
    print(f"✓ Ontology connected")

    print(f"\n📦 Creating batch...")
    batch = project.create_batch(
        f"audio-token-batch-class-based-{str(uuid.uuid4())[:8]}",
        global_keys=[global_key_upload],
        priority=5,
    )
    print(f"✓ Batch created: {batch.uid}")

    # Determine which predictions to upload
   
    print(f"\n📤 Preparing COMPUTED predictions from class-based API...")
    upload_predictions = predictions_for_upload
    # Update global_key in predictions to match uploaded asset
     for pred in upload_predictions:
         if 'dataRow' in pred and 'globalKey' in pred['dataRow']:
             pred['dataRow']['globalKey'] = global_key_upload
     upload_type = "COMPUTED"

    print(f"📊 Uploading {len(upload_predictions)} {upload_type} predictions...")

    try:
        upload_job = lb.MALPredictionImport.create_from_objects(
            client=client,
            project_id=project.uid,
            name=f"audio_token_mal_{upload_type.lower()}_{str(uuid.uuid4())[:8]}",
            predictions=upload_predictions,
        )

        print("⏳ Waiting for MAL upload to complete...")
        upload_job.wait_till_done()

        if upload_job.errors:
            print(f"❌ Errors: {upload_job.errors}")
        else:
            print(f"✓ MAL upload completed successfully!")
            print(f"ℹ️  No errors reported")

    except Exception as e:
        print(f"❌ MAL upload failed: {e}")
        import traceback
        traceback.print_exc()

    print(f"\n🌐 View Project in Browser:")
    print(f"   - Project ID: {project.uid}")
    print(f"   - URL: http://localhost:3000/projects/{project.uid}/overview")
    print("="*80)

review-notebook-app · 2025-09-08T17:52:06Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

kvilon · 2025-10-07T20:30:31Z

libs/labelbox/src/labelbox/data/annotation_types/temporal.py

+    classifications: Optional[
+        List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]]
+    ] = None
+    feature_schema_id: Optional[Cuid] = None


What is feature_schema_id for?

kvilon · 2025-10-07T20:30:45Z

libs/labelbox/src/labelbox/data/annotation_types/temporal.py

+    classifications: Optional[
+        List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]]
+    ] = None
+    feature_schema_id: Optional[Cuid] = None


What is feature_schema_id for?

we don't actually use it. Removing it for now

kvilon · 2025-10-07T20:35:45Z

libs/labelbox/src/labelbox/data/annotation_types/temporal.py

+        List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]]
+    ] = None
+    feature_schema_id: Optional[Cuid] = None
+    extra: Dict[str, Any] = Field(default_factory=dict)


What is extra for? Do we use it anywhere?

we don't actually use it. Removing it for now

kvilon · 2025-10-07T20:37:24Z

libs/labelbox/src/labelbox/data/serialization/ndjson/label.py

                yield NDObject.from_common(segments, label.data)

+    @classmethod
+    def _create_temporal_annotations(


Should we call it _create_temporal_classifications or smth like this? To differentiate them from temporal annotations (video bbox, polylines, etc)

DISCARD -- If you refereto L74-L78 in this file there is a function called create video annotations + it follows the established sturcture of this class. Based on this lmk if you think we should change it - imho keep it as is. Lmk if you have a strong opinion here ---

Changing to Classifications

kvilon · 2025-10-07T20:45:59Z

libs/labelbox/src/labelbox/data/serialization/ndjson/temporal.py

+    dataRow: Dict[str, str]
+
+
+def create_temporal_ndjson_annotations(


This function probably should also be renamed to create_temporal_ndjson_classifications or create_temporal_ndjson_classification_annotations

hmm, i think classification is better than annotation actually. Classification != Annotation, so we should use classification for this.

examples/annotation_import/audio.ipynb

libs/labelbox/src/labelbox/data/annotation_types/label.py

examples/annotation_import/audio.ipynb

libs/labelbox/src/labelbox/data/annotation_types/label.py

kvilon

PAT & LGTM

rishisurana-labelbox added 2 commits September 3, 2025 14:55

chore: PoC + ipynb

e4fd630

chore: use ms instead of s in sdk interface

dbcc7bf

rishisurana-labelbox requested a review from a team as a code owner September 8, 2025 17:52

rishisurana-labelbox requested review from Jmsa, KeshavSahoo, Tim-Kerr, cyrusj89, dsinha244, lgluszek and ramy1951 September 8, 2025 17:52

rishisurana-labelbox temporarily deployed to Test-PyPI September 8, 2025 17:52 — with GitHub Actions Inactive

github-actions bot added 2 commits September 8, 2025 17:52

🎨 Cleaned

dbb592f

📝 README updated

ff298d4

rishisurana-labelbox requested review from kjamrozy and kvilon September 8, 2025 18:26

chore: it works for temporal text/radio/checklist classifications

16896fd

rishisurana-labelbox temporarily deployed to Test-PyPI September 11, 2025 19:13 — with GitHub Actions Inactive

chore: clean up and organize code

7a666cc

rishisurana-labelbox temporarily deployed to Test-PyPI September 11, 2025 20:46 — with GitHub Actions Inactive

chore: update tests fail and documentation update

ac58ad0

rishisurana-labelbox temporarily deployed to Test-PyPI September 11, 2025 21:22 — with GitHub Actions Inactive

github-actions bot and others added 3 commits September 11, 2025 21:23

🎨 Cleaned

67dd14a

📝 README updated

a1600e5

chore: improve imports

b4d2f42

rishisurana-labelbox temporarily deployed to Test-PyPI September 11, 2025 23:56 — with GitHub Actions Inactive

rishisurana-labelbox added 2 commits September 11, 2025 16:57

chore: restore py version

fadb14e

chore: restore py version

1e12596

rishisurana-labelbox temporarily deployed to Test-PyPI September 11, 2025 23:58 — with GitHub Actions Inactive

rishisurana-labelbox temporarily deployed to Test-PyPI September 12, 2025 17:00 — with GitHub Actions Inactive

chore: new new new interface for tempral classes

1e424ef

rishisurana-labelbox temporarily deployed to Test-PyPI October 3, 2025 21:49 — with GitHub Actions Inactive

chore: cleanup

fb209f0

rishisurana-labelbox temporarily deployed to Test-PyPI October 6, 2025 17:50 — with GitHub Actions Inactive

chore: final nail

15bb17b

rishisurana-labelbox temporarily deployed to Test-PyPI October 7, 2025 06:40 — with GitHub Actions Inactive

chore: docs and tests

c28a7ca

rishisurana-labelbox temporarily deployed to Test-PyPI October 7, 2025 06:50 — with GitHub Actions Inactive

github-actions bot and others added 2 commits October 7, 2025 06:51

🎨 Cleaned

d2dc658

chore: lint

76bdf35

rishisurana-labelbox temporarily deployed to Test-PyPI October 7, 2025 15:46 — with GitHub Actions Inactive

kvilon requested changes Oct 7, 2025

View reviewed changes

chore: stan + cursor bugbot changes

58aaf62

rishisurana-labelbox temporarily deployed to Test-PyPI October 7, 2025 21:44 — with GitHub Actions Inactive

cursor bot reviewed Oct 7, 2025

View reviewed changes

examples/annotation_import/audio.ipynb Show resolved Hide resolved

chore: remove extra keyword (unused)

9afd82d

rishisurana-labelbox temporarily deployed to Test-PyPI October 7, 2025 22:04 — with GitHub Actions Inactive

cursor bot reviewed Oct 7, 2025

View reviewed changes

examples/annotation_import/audio.ipynb Show resolved Hide resolved

examples/annotation_import/audio.ipynb Show resolved Hide resolved

rishisurana-labelbox temporarily deployed to Test-PyPI October 8, 2025 18:43 — with GitHub Actions Inactive

cursor bot reviewed Oct 8, 2025

View reviewed changes

libs/labelbox/src/labelbox/data/annotation_types/label.py Show resolved Hide resolved

libs/labelbox/src/labelbox/data/annotation_types/label.py Outdated Show resolved Hide resolved

rishisurana-labelbox force-pushed the rishi/ptdt-3807/temporal-audio-support-sdk branch from 5311011 to 9afd82d Compare October 9, 2025 17:58

rishisurana-labelbox temporarily deployed to Test-PyPI October 9, 2025 17:58 — with GitHub Actions Inactive

cursor bot reviewed Oct 9, 2025

View reviewed changes

examples/annotation_import/audio.ipynb Outdated Show resolved Hide resolved

examples/annotation_import/audio.ipynb Show resolved Hide resolved

chore: lint

ad2223c

rishisurana-labelbox temporarily deployed to Test-PyPI October 10, 2025 07:18 — with GitHub Actions Inactive

🎨 Cleaned

f49a1d8

cursor bot reviewed Oct 10, 2025

View reviewed changes

libs/labelbox/src/labelbox/data/annotation_types/label.py Show resolved Hide resolved

kvilon approved these changes Oct 14, 2025

View reviewed changes

rishisurana-labelbox merged commit 0d32b92 into develop Oct 14, 2025
3 checks passed

rishisurana-labelbox deleted the rishi/ptdt-3807/temporal-audio-support-sdk branch October 14, 2025 15:29

		dataRow: Dict[str, str]


		def create_temporal_ndjson_annotations(

PTDT-3807: Add temporal audio annotation support #2013

PTDT-3807: Add temporal audio annotation support #2013

Uh oh!

Conversation

rishisurana-labelbox commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

All Submissions

New Feature Submissions

Changes to Core Features

Summary of Changes

New Audio Temporal Annotation Types

Core Infrastructure Updates

Code Architecture Improvements

Testing

Documentation & Examples

Serialization & Import Support

Key Features

Simple Text Classification

Radio/Checklist with Temporal Ranges

Nested Classifications (Arbitrary Depth)

Serialization to NDJSON

Ontology Setup

Technical Architecture

New Temporal Classification API

Core Classes (temporal.py)

Key Design Principles

Serialization (temporal.py in serialization/ndjson/)

Frame Assignment Logic

Usage Examples

Uh oh!

review-notebook-app bot commented Sep 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rishisurana-labelbox Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kvilon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

rishisurana-labelbox commented Sep 8, 2025 •

edited

Loading

Core Classes (`temporal.py`)

Serialization (`temporal.py` in `serialization/ndjson/`)

rishisurana-labelbox Oct 7, 2025 •

edited

Loading