Skip to content

Conversation

rishisurana-labelbox
Copy link

@rishisurana-labelbox rishisurana-labelbox commented Sep 8, 2025

Description

This PR introduces Audio Temporal Annotations - a new feature that enables precise time-based annotations for audio files in the Labelbox SDK. This includes support for temporal classification annotations with millisecond-level timing precision.

Motivation: Audio annotation workflows require precise timing control for applications like:

  • Podcast transcription with speaker identification
  • Call center quality analysis with word-level annotations
  • Music analysis with temporal classifications
  • Sound event detection with precise timestamps

Context: This feature extends the existing audio annotation infrastructure to support temporal annotations, using a millisecond-based timing system that provides the precision needed for audio applications while maintaining compatibility with the existing NDJSON serialization format.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Document change (fix typo or modifying any markdown files, code comments or anything in the examples folder only)

All Submissions

  • Have you followed the guidelines in our Contributing document?
  • Have you provided a description?
  • Are your changes properly formatted?

New Feature Submissions

  • Does your submission pass tests?
  • Have you added thorough tests for your new feature?
  • Have you commented your code, particularly in hard-to-understand areas?
  • Have you added a Docstring?

Changes to Core Features

  • Have you written new tests for your core changes, as applicable?
  • Have you successfully run tests with your changes locally?
  • Have you updated any code comments, as applicable?

Summary of Changes

New Audio Temporal Annotation Types

  • AudioClassificationAnnotation: Time-based classifications (radio, checklist, text) for audio segments
  • Millisecond-based timing: Direct millisecond input for precise timing control
  • INDEX scope support: Temporal classifications use INDEX scope for frame-based annotations

Core Infrastructure Updates

  • Generic temporal processing: Refactored audio-specific logic into reusable TemporalFrame, AnnotationGroupManager, ValueGrouper, and HierarchyBuilder components
  • Modular architecture: Created temporal.py module with generic components that can be reused for video, audio, and other temporal annotation types
  • Frame-based organization: Temporal annotations organized by millisecond frames for efficient processing
  • MAL compatibility: Audio temporal annotations work with Model-Assisted Labeling pipeline

Code Architecture Improvements

  • Separation of concerns: Extracted complex nested logic into focused, single-purpose components
  • Type safety: Generic components with Generic[TemporalAnnotation] for compile-time type checking
  • Configurable frame extraction: frame_extractor callable allows different annotation types to use the same processing logic
  • Enhanced frame operations: Added overlaps() method and improved temporal containment logic
  • Backward compatibility: Audio usage remains unchanged via create_audio_ndjson_annotations() convenience function

Testing

  • Comprehensive serialization test scripts: Added test_v3_serialization.py(attached at the bottom) that validates both structure and values
  • Updated test cases: Enhanced test coverage for audio temporal annotation functionality
  • Integration tests: Audio temporal annotations work with existing import/export pipelines
  • Edge case testing: Precision testing for millisecond timing and mixed annotation types
  • Value validation: Tests verify that all annotation values and frame ranges are preserved correctly

Documentation & Examples

  • Updated example notebook: Enhanced audio.ipynb with temporal annotation examples
  • Demo script: Added demo_audio_token_temporal.py showing per-token temporal annotations
  • Use case examples: Word-level speaker identification and temporal classifications
  • Best practices: Guidelines for ontology setup with INDEX scope

Serialization & Import Support

  • NDJSON format: Audio temporal annotations serialize to standard NDJSON format with hierarchical structure
  • Import pipeline: Full support for audio temporal annotation imports via MAL and Label Import
  • Frame metadata: Millisecond timing preserved in serialized format
  • Backward compatibility: Existing audio annotation workflows unchanged
  • Nested classification support: Complex hierarchical temporal classifications with proper containment logic

Key Features

Simple Text Classification

from labelbox.data.annotation_types.temporal import TemporalClassificationText

# Multiple text values at different time ranges
transcription = TemporalClassificationText(
    name="transcription",
    value=[
        (1000, 1100, "Hello"),                      # 1.0s - 1.1s
        (1500, 2400, "How can I help you today?"),  # 1.5s - 2.4s
        (2500, 2700, "Thank you"),                  # 2.5s - 2.7s
    ]
)

Radio/Checklist with Temporal Ranges

from labelbox.data.annotation_types.temporal import (
    TemporalClassificationQuestion,
    TemporalClassificationAnswer
)

# Radio: single answer with discontinuous time ranges
speaker = TemporalClassificationQuestion(
    name="speaker",
    value=[
        TemporalClassificationAnswer(
            name="user",
            frames=[(200, 1500), (2000, 2500)]  # User speaks in 2 segments
        )
    ]
)

# Checklist: multiple answers with their own time ranges
audio_quality = TemporalClassificationQuestion(
    name="audio_quality",
    value=[
        TemporalClassificationAnswer(
            name="background_noise",
            frames=[(300, 800), (1200, 1800)]
        ),
        TemporalClassificationAnswer(
            name="echo",
            frames=[(2200, 2900)]
        )
    ]
)

Nested Classifications (Arbitrary Depth)

# Text → Text → Text (3 levels deep)
transcription_with_notes = TemporalClassificationText(
    name="transcription",
    value=[(1500, 2400, "How can I help you today?")],
    classifications=[
        TemporalClassificationText(
            name="speaker_notes",
            value=[(1600, 2000, "Polite greeting")],
            classifications=[
                TemporalClassificationText(
                    name="context_tags",
                    value=[(1800, 2000, "customer service tone")]
                )
            ]
        )
    ]
)

# Radio → Radio → Radio (3 levels deep)
speaker_with_tone = TemporalClassificationQuestion(
    name="speaker",
    value=[
        TemporalClassificationAnswer(
            name="user",
            frames=[(200, 1600)],
            classifications=[
                TemporalClassificationQuestion(
                    name="tone",
                    value=[
                        TemporalClassificationAnswer(
                            name="professional",
                            frames=[(1000, 1600)],
                            classifications=[
                                TemporalClassificationQuestion(
                                    name="clarity",
                                    value=[
                                        TemporalClassificationAnswer(
                                            name="clear",
                                            frames=[(1300, 1500)]
                                        )
                                    ]
                                )
                            ]
                        )
                    ]
                )
            ]
        )
    ]
)

Serialization to NDJSON

from labelbox.data.serialization.ndjson.temporal import create_temporal_ndjson_classifications

annotations = [transcription, speaker, audio_quality]

# Convert to NDJSON format for MAL upload
ndjson_predictions = create_temporal_ndjson_classifications(
    annotations=annotations,
    data_global_key="audio_file.mp3"
)

# Upload predictions
predictions_for_upload = [
    pred.model_dump(by_alias=True, exclude_none=True)
    for pred in ndjson_predictions
]

Ontology Setup

# INDEX scope required for temporal classifications
ontology_builder = lb.OntologyBuilder(classifications=[
    lb.Classification(
        class_type=lb.Classification.Type.TEXT,
        name="transcription",
        scope=lb.Classification.Scope.INDEX,  # INDEX for temporal
        options=[
            lb.Classification(
                class_type=lb.Classification.Type.TEXT,
                name="speaker_notes",
            )
        ]
    ),
    lb.Classification(
        class_type=lb.Classification.Type.RADIO,
        name="speaker",
        scope=lb.Classification.Scope.INDEX,
        options=[
            lb.Option("user", options=[
                lb.Classification(
                    class_type=lb.Classification.Type.RADIO,
                    name="tone",
                    options=[lb.Option("professional"), lb.Option("casual")]
                )
            ]),
            lb.Option("assistant")
        ]
    ),
])

Technical Architecture

New Temporal Classification API

The SDK now provides a simplified, recursive temporal classification interface that handles audio, video, and other time-based media:

Core Classes (temporal.py)

TemporalClassificationText

class TemporalClassificationText(BaseModel):
    """Text classification with multiple temporal values."""
    name: str
    value: List[Tuple[int, int, str]]  # [(start_ms, end_ms, text_value), ...]
    classifications: Optional[List[Union[
        TemporalClassificationText,
        TemporalClassificationQuestion
    ]]] = None

TemporalClassificationQuestion (Radio/Checklist)

class TemporalClassificationQuestion(BaseModel):
    """Radio or Checklist question with temporal answers."""
    name: str
    value: List[TemporalClassificationAnswer]  # Radio: 1 answer, Checklist: many
    classifications: Optional[List[Union[
        TemporalClassificationText,
        TemporalClassificationQuestion
    ]]] = None

TemporalClassificationAnswer

class TemporalClassificationAnswer(BaseModel):
    """Answer option with discontinuous frame ranges."""
    name: str
    frames: List[Tuple[int, int]] = []  # [(start_ms, end_ms), ...]
    classifications: Optional[List[Union[
        TemporalClassificationText,
        TemporalClassificationQuestion
    ]]] = None

Key Design Principles

  1. Recursive Structure: All three classes support arbitrary nesting depth
  2. Frame Validation: Nested classifications must have frames within parent frames
  3. Discontinuous Ranges: Answers can span multiple non-contiguous time segments
  4. Type Agnostic: Text, Radio, and Checklist use the same nested structure

Serialization (temporal.py in serialization/ndjson/)

Main Entry Point:

def create_temporal_ndjson_classifications(
    annotations: List[Union[TemporalClassificationText, TemporalClassificationQuestion]],
    data_global_key: str,
) -> List[TemporalNDJSON]:
    """
    Converts temporal classifications to NDJSON format.

    - Groups annotations by name
    - Validates frame containment
    - Merges discontinuous ranges
    - Recursively processes nested classifications
    """

Processing Functions:

  • _process_text_group(): Handles text classifications, groups by text value
  • _process_question_group(): Handles radio/checklist, groups by answer name
  • _process_nested_classifications(): Recursively processes nested structures
  • _filter_classifications_by_overlap(): Assigns nested classifications based on frame overlap
  • _frames_overlap(): Checks if any frames overlap between two sets
  • _is_frame_subset(): Validates child frames are within parent frames

Frame Assignment Logic

Parent-Child Relationship:

  • Nested Tree uses frames: List[Tuple[int, int]] (multiple discontinuous ranges)
  • Nested frames MUST be subsets of parent frames
  • Orphaned classifications (invalid frames) are logged as warnings and discarded

Overlap-Based Assignment:

  • Nested classifications are matched to parents by frame overlap
  • A nested classification is assigned if ANY of its frames overlap with ANY parent frame
  • Multiple text values/answers can share the same nested classifications if frames overlap

Usage Examples

demo script
#!/usr/bin/env python3

import labelbox as lb
import uuid
from labelbox.data.serialization.ndjson.label import NDLabel
import labelbox.types as lb_types
from labelbox.data.annotation_types.label import Label
import argparse

# Parse command line arguments
parser = argparse.ArgumentParser(description='Compare and optionally upload temporal audio classifications')
parser.add_argument('--upload-raw', action='store_true', help='Upload raw predictions from actual.py')
parser.add_argument('--upload-computed', action='store_true', help='Upload computed predictions from class-based API')
parser.add_argument('--env', type=str, choices=['local', 'prod'], default='local', help='Environment to upload to (local or prod)')
args = parser.parse_args()



asset = {
    "row_data": "https://storage.googleapis.com/lb-artifacts-testing-public/audio/gpt_how_can_you_help_me.wav",
    "global_key": "123",
    "media_type": "AUDIO",
}

# Step 2: Create ontology with temporal classifications (SAME AS NDJSON VERSION)
ontology_builder = lb.OntologyBuilder(
    tools=[
        # No tools needed for audio temporal - classifications handle everything
    ],
    classifications=[
        # Text with nested classifications
        lb.Classification(
            class_type=lb.Classification.Type.TEXT,
            name="text_class",
            scope=lb.Classification.Scope.INDEX,
            options=[
                lb.Classification(
                    class_type=lb.Classification.Type.TEXT,
                    name="text_text",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.RADIO,
                            name="text_text_radio",
                            options=[
                                lb.Option("text_text_radio_option_1"),
                                lb.Option("text_text_radio_option_2"),
                            ],
                        ),
                        lb.Classification(
                            class_type=lb.Classification.Type.TEXT,
                            name="text_text_text",
                        ),
                    ]
                ),
                lb.Classification(
                    class_type=lb.Classification.Type.RADIO,
                    name="text_radio",
                    options=[
                        lb.Option(
                            "text_radio_option_1",
                            options=[
                                lb.Classification(
                                    class_type=lb.Classification.Type.CHECKLIST,
                                    name="text_radio_checklist",
                                    options=[
                                        lb.Option("text_radio_checklist_option_1"),
                                        lb.Option("text_radio_checklist_option_2"),
                                        lb.Option("text_radio_checklist_option_3"),
                                    ]
                                )
                            ]
                        ),
                        lb.Option("text_radio_option_2"),
                    ]
                ),
                lb.Classification(
                    class_type=lb.Classification.Type.CHECKLIST,
                    name="text_checklist",
                    options=[
                        lb.Option(
                            "text_checklist_option_1",
                            options=[
                                lb.Classification(
                                    class_type=lb.Classification.Type.TEXT,
                                    name="text_checklist_text",
                                )
                            ]
                        ),
                        lb.Option("text_checklist_option_2"),
                        lb.Option("text_checklist_option_3"),
                    ]
                )
            ],
        ),

        # Radio with nested classifications
        lb.Classification(
            class_type=lb.Classification.Type.RADIO,
            name="radio_class",
            scope=lb.Classification.Scope.INDEX,
            options=[
                lb.Option(
                    "radio_class_option_1",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.TEXT,
                            name="radio_text",
                            options=[
                                lb.Classification(
                                    class_type=lb.Classification.Type.CHECKLIST,
                                    name="radio_text_checklist",
                                    options=[
                                        lb.Option("option_1_radio_text_checklist"),
                                        lb.Option("option_2_radio_text_checklist"),
                                        lb.Option("option_3_radio_text_checklist"),
                                    ],
                                )
                            ]
                        )
                    ],
                ),
                lb.Option(
                    "radio_class_option_2",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.RADIO,
                            name="radio_radio",
                            options=[
                                lb.Option(
                                    "radio_radio_option_1",
                                    options=[
                                        lb.Classification(
                                            class_type=lb.Classification.Type.TEXT,
                                            name="radio_radio_text",
                                        )
                                    ]
                                ),
                                lb.Option("radio_radio_option_2"),
                            ],
                        )
                    ]
                ),
                lb.Option(
                    "radio_class_option_3",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.CHECKLIST,
                            name="radio_checklist",
                            options=[
                                lb.Option(
                                    "radio_checklist_option_1",
                                    options=[
                                        lb.Classification(
                                            class_type=lb.Classification.Type.RADIO,
                                            name="radio_checklist_radio",
                                            options=[
                                                lb.Option("radio_checklist_radio_option_1"),
                                                lb.Option("radio_checklist_radio_option_2"),
                                            ],
                                        )
                                    ]
                                ),
                                lb.Option("radio_checklist_option_2"),
                            ],
                        )
                    ]
                ),
            ],
        ),

        # Checklist with nested classifications
        lb.Classification(
            class_type=lb.Classification.Type.CHECKLIST,
            name="checklist_class",
            scope=lb.Classification.Scope.INDEX,
            options=[
                lb.Option(
                    "checklist_class_option_1",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.TEXT,
                            name="checklist_text",
                            options=[
                                lb.Classification(
                                    class_type=lb.Classification.Type.TEXT,
                                    name="checklist_text_text"
                                )
                            ]
                        )
                    ],
                ),
                lb.Option(
                    "checklist_class_option_2",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.RADIO,
                            name="checklist_radio",
                            options=[
                                lb.Option(
                                    "option_1_checklist_radio",
                                    options=[
                                        lb.Classification(
                                            class_type=lb.Classification.Type.RADIO,
                                            name="checklist_radio_radio",
                                            options=[
                                                lb.Option("option_1_checklist_radio_radio"),
                                                lb.Option("option_2_checklist_radio_radio"),
                                            ],
                                        )
                                    ]
                                ),
                                lb.Option("option_2_checklist_radio"),
                            ],
                        )
                    ],
                ),
                lb.Option(
                    "checklist_class_option_3",
                    options=[
                        lb.Classification(
                            class_type=lb.Classification.Type.CHECKLIST,
                            name="checklist_checklist",
                            options=[
                                lb.Option(
                                    "option_1_checklist_checklist",
                                    options=[
                                        lb.Classification(
                                            class_type=lb.Classification.Type.CHECKLIST,
                                            name="checklist_checklist_checklist",
                                            options=[
                                                lb.Option("option_1_checklist_checklist_checklist"),
                                                lb.Option("option_2_checklist_checklist_checklist"),
                                                lb.Option("option_3_checklist_checklist_checklist"),
                                            ]
                                        )
                                    ]
                                ),
                                lb.Option("option_2_checklist_checklist"),
                                lb.Option("option_3_checklist_checklist"),
                            ],
                        )
                    ],
                ),
            ],
        ),
    ],
)




# Step 5: Create temporal classifications using class-based API
# TEXT_CLASS: Text > Text/Radio/Checklist branches
text_class_annotation = lb_types.TemporalClassificationText(
    name="text_class",
    value=[
        # text_text branch with nested radio
        (100, 1200, "top text with text classification"),
        # text_checklist branch with nested checklist
        (1400, 2400, "top level text with checklist classification"),
        # text_radio branch with nested text
        (2600, 3500, "text with radio"),
    ],
    classifications=[
        # text_text nested classification
        lb_types.TemporalClassificationText(
            name="text_text",
            value=[
                (200, 1000, "nested text classification with nested radio"),
            ],
            classifications=[
                # text_text_radio nested radio
                lb_types.TemporalClassificationQuestion(
                    name="text_text_radio",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="text_text_radio_option_1",
                            frames=[(300, 500)],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="text_text_radio_option_2",
                            frames=[(501, 900)],
                        ),
                    ],
                ),
                # text_text_text nested text
                lb_types.TemporalClassificationText(
                    name="text_text_text",
                    value=[
                        (900, 1000, "text_text_text value"),
                    ],
                ),
            ],
        ),
        # text_checklist nested classification
        lb_types.TemporalClassificationQuestion(
            name="text_checklist",
            value=[
                lb_types.TemporalClassificationAnswer(
                    name="text_checklist_option_1",
                    frames=[(1500, 2000)],
                    classifications=[
                        lb_types.TemporalClassificationText(
                            name="text_checklist_text",
                            value=[
                                (1600, 2000, "text / checklist / option 1 / text classification value"),
                            ],
                        ),
                    ],
                ),
                lb_types.TemporalClassificationAnswer(
                    name="text_checklist_option_2",
                    frames=[(1800, 2200)],
                ),
            ],
        ),
        # text_radio nested classification
        lb_types.TemporalClassificationQuestion(
            name="text_radio",
            value=[
                lb_types.TemporalClassificationAnswer(
                    name="text_radio_option_1",
                    frames=[(2700, 3200)],
                    classifications=[
                        lb_types.TemporalClassificationQuestion(
                            name="text_radio_checklist",
                            value=[
                                lb_types.TemporalClassificationAnswer(
                                    name="text_radio_checklist_option_1",
                                    frames=[(2800, 3100)],
                                ),
                                lb_types.TemporalClassificationAnswer(
                                    name="text_radio_checklist_option_3",
                                    frames=[(2900, 3200)],
                                ),
                            ],
                        ),
                    ],
                ),
                lb_types.TemporalClassificationAnswer(
                    name="text_radio_option_2",
                    frames=[(3201, 3500)],
                ),
            ],
        ),
    ],
)

# RADIO_CLASS: Radio > Text/Radio/Checklist branches
radio_class_annotation = lb_types.TemporalClassificationQuestion(
    name="radio_class",
    value=[
        lb_types.TemporalClassificationAnswer(
            name="radio_class_option_1",
            frames=[
                (200, 700),
                (1401, 2500),
            ],
            classifications=[
                lb_types.TemporalClassificationText(
                    name="radio_text",
                    value=[
                        (300, 700, "radio_text value"),
                        (1401, 2500, "radio_text value"),
                    ],
                    classifications=[
                        lb_types.TemporalClassificationQuestion(
                            name="radio_text_checklist",
                            value=[
                                lb_types.TemporalClassificationAnswer(
                                    name="option_1_radio_text_checklist",
                                    frames=[(1401, 2099)],
                                ),
                                lb_types.TemporalClassificationAnswer(
                                    name="option_2_radio_text_checklist",
                                    frames=[(1600, 2299)],
                                ),
                                lb_types.TemporalClassificationAnswer(
                                    name="option_3_radio_text_checklist",
                                    frames=[(1900, 2500)],
                                ),
                            ],
                        ),
                    ],
                ),
            ],
        ),
        lb_types.TemporalClassificationAnswer(
            name="radio_class_option_2",
            frames=[(701, 1400)],
            classifications=[
                lb_types.TemporalClassificationQuestion(
                    name="radio_radio",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="radio_radio_option_1",
                            frames=[(701, 1199)],
                            classifications=[
                                lb_types.TemporalClassificationText(
                                    name="radio_radio_text",
                                    value=[
                                        (900, 1199, "radio_radio_text value"),
                                    ],
                                ),
                            ],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="radio_radio_option_2",
                            frames=[(1200, 1400)],
                        ),
                    ],
                ),
            ],
        ),
        lb_types.TemporalClassificationAnswer(
            name="radio_class_option_3",
            frames=[(2600, 3500)],
            classifications=[
                lb_types.TemporalClassificationQuestion(
                    name="radio_checklist",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="radio_checklist_option_1",
                            frames=[(2600, 3300)],
                            classifications=[
                                lb_types.TemporalClassificationQuestion(
                                    name="radio_checklist_radio",
                                    value=[
                                        lb_types.TemporalClassificationAnswer(
                                            name="radio_checklist_radio_option_1",
                                            frames=[(2600, 3300)],
                                        ),
                                    ],
                                ),
                            ],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="radio_checklist_option_2",
                            frames=[(3000, 3500)],
                        ),
                    ],
                ),
            ],
        ),
    ],
)

# CHECKLIST_CLASS: Checklist > Text/Radio/Checklist branches
checklist_class_annotation = lb_types.TemporalClassificationQuestion(
    name="checklist_class",
    value=[
        lb_types.TemporalClassificationAnswer(
            name="checklist_class_option_1",
            frames=[(200, 1899)],
            classifications=[
                lb_types.TemporalClassificationText(
                    name="checklist_text",
                    value=[
                        (200, 1899, "checklist_text value"),
                    ],
                    classifications=[
                        lb_types.TemporalClassificationText(
                            name="checklist_text_text",
                            value=[
                                (400, 1899, "checklist_text_text value"),
                            ],
                        ),
                    ],
                ),
            ],
        ),
        lb_types.TemporalClassificationAnswer(
            name="checklist_class_option_2",
            frames=[(900, 2500)],
            classifications=[
                lb_types.TemporalClassificationQuestion(
                    name="checklist_radio",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="option_1_checklist_radio",
                            frames=[(900, 1999)],
                            classifications=[
                                lb_types.TemporalClassificationQuestion(
                                    name="checklist_radio_radio",
                                    value=[
                                        lb_types.TemporalClassificationAnswer(
                                            name="option_1_checklist_radio_radio",
                                            frames=[(1100, 1999)],
                                        ),
                                    ],
                                ),
                            ],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="option_2_checklist_radio",
                            frames=[(2000, 2500)],
                        ),
                    ],
                ),
            ],
        ),
        lb_types.TemporalClassificationAnswer(
            name="checklist_class_option_3",
            frames=[(2600, 3500)],
            classifications=[
                lb_types.TemporalClassificationQuestion(
                    name="checklist_checklist",
                    value=[
                        lb_types.TemporalClassificationAnswer(
                            name="option_1_checklist_checklist",
                            frames=[(2600, 3300)],
                            classifications=[
                                lb_types.TemporalClassificationQuestion(
                                    name="checklist_checklist_checklist",
                                    value=[
                                        lb_types.TemporalClassificationAnswer(
                                            name="option_2_checklist_checklist_checklist",
                                            frames=[(2600, 2999)],
                                        ),
                                        lb_types.TemporalClassificationAnswer(
                                            name="option_3_checklist_checklist_checklist",
                                            frames=[(2800, 3300)],
                                        ),
                                    ],
                                ),
                            ],
                        ),
                        lb_types.TemporalClassificationAnswer(
                            name="option_2_checklist_checklist",
                            frames=[(3000, 3500)],
                        ),
                    ],
                ),
            ],
        ),
    ],
)


# Create Label with all temporal annotations
label = Label(
    data={"global_key": asset["global_key"]},
    annotations=[
        text_class_annotation,
        radio_class_annotation,
        checklist_class_annotation,
    ],
)

ndjson_predictions = list(NDLabel.from_common([label]))

# TemporalNDJSON objects are returned directly (not wrapped in NDLabel.annotations)
# They already have the correct structure: name, answer, dataRow
predictions_for_upload = [pred.model_dump() for pred in ndjson_predictions]



if args.upload_raw or args.upload_computed:
    print("\n" + "="*80)
    print("STARTING MAL UPLOAD PROCESS")
    print("="*80)

    # API Configuration
    api_key_local = ""
    api_key_prod =  ""

    rest_endpoint_local = "http://localhost:3000/api/v1"
    endpoint_local = "http://localhost:8080/graphql"

    # Initialize client based on environment
    if args.env == 'prod':
        print(f"🌍 Using PROD environment")
        client = lb.Client(api_key=api_key_prod)
    else:
        print(f"🌍 Using LOCAL environment")
        client = lb.Client(
            api_key=api_key_local,
            endpoint=endpoint_local,
            rest_endpoint=rest_endpoint_local
        )

    # Generate unique global key for the asset
    global_key_upload = f"audio-token-demo-class-based-{str(uuid.uuid4())}"
    asset_upload = {
        "row_data": "https://storage.googleapis.com/lb-artifacts-testing-public/audio/gpt_how_can_you_help_me.wav",
        "global_key": global_key_upload,
        "media_type": "AUDIO",
    }

    print(f"\n📦 Creating dataset...")
    dataset = client.create_dataset(
        name=f"audio_token_demo_class_based_dataset_{str(uuid.uuid4())[:8]}",
        iam_integration=None
    )
    print(f"✓ Dataset created: {dataset.uid}")

    print(f"\n📤 Uploading data row with global_key: {global_key_upload}")
    task = dataset.create_data_rows([asset_upload])
    task.wait_till_done()
    print(f"✓ Data row uploaded")

    print(f"\n🏗️  Creating ontology...")
    ontology = client.create_ontology(
        f"Audio with permutations class-based {str(uuid.uuid4())[:8]}",
        ontology_builder.asdict(),
        media_type=lb.MediaType.Audio,
    )
    print(f"✓ Ontology created: {ontology.uid}")

    print(f"\n📋 Creating project...")
    project = client.create_project(
        name=f"Audio with permutations class-based {str(uuid.uuid4())[:8]}",
        media_type=lb.MediaType.Audio
    )
    print(f"✓ Project created: {project.uid}")

    print(f"\n🔗 Connecting ontology to project...")
    project.connect_ontology(ontology)
    print(f"✓ Ontology connected")

    print(f"\n📦 Creating batch...")
    batch = project.create_batch(
        f"audio-token-batch-class-based-{str(uuid.uuid4())[:8]}",
        global_keys=[global_key_upload],
        priority=5,
    )
    print(f"✓ Batch created: {batch.uid}")

    # Determine which predictions to upload
   
    print(f"\n📤 Preparing COMPUTED predictions from class-based API...")
    upload_predictions = predictions_for_upload
    # Update global_key in predictions to match uploaded asset
     for pred in upload_predictions:
         if 'dataRow' in pred and 'globalKey' in pred['dataRow']:
             pred['dataRow']['globalKey'] = global_key_upload
     upload_type = "COMPUTED"

    print(f"📊 Uploading {len(upload_predictions)} {upload_type} predictions...")

    try:
        upload_job = lb.MALPredictionImport.create_from_objects(
            client=client,
            project_id=project.uid,
            name=f"audio_token_mal_{upload_type.lower()}_{str(uuid.uuid4())[:8]}",
            predictions=upload_predictions,
        )

        print("⏳ Waiting for MAL upload to complete...")
        upload_job.wait_till_done()

        if upload_job.errors:
            print(f"❌ Errors: {upload_job.errors}")
        else:
            print(f"✓ MAL upload completed successfully!")
            print(f"ℹ️  No errors reported")

    except Exception as e:
        print(f"❌ MAL upload failed: {e}")
        import traceback
        traceback.print_exc()

    print(f"\n🌐 View Project in Browser:")
    print(f"   - Project ID: {project.uid}")
    print(f"   - URL: http://localhost:3000/projects/{project.uid}/overview")
    print("="*80)

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

cursor[bot]

This comment was marked as outdated.

classifications: Optional[
List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]]
] = None
feature_schema_id: Optional[Cuid] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is feature_schema_id for?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

classifications: Optional[
List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]]
] = None
feature_schema_id: Optional[Cuid] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is feature_schema_id for?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't actually use it. Removing it for now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]]
] = None
feature_schema_id: Optional[Cuid] = None
extra: Dict[str, Any] = Field(default_factory=dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is extra for? Do we use it anywhere?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't actually use it. Removing it for now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

yield NDObject.from_common(segments, label.data)

@classmethod
def _create_temporal_annotations(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it _create_temporal_classifications or smth like this? To differentiate them from temporal annotations (video bbox, polylines, etc)

Copy link
Author

@rishisurana-labelbox rishisurana-labelbox Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISCARD -- If you refereto L74-L78 in this file there is a function called create video annotations + it follows the established sturcture of this class. Based on this lmk if you think we should change it - imho keep it as is. Lmk if you have a strong opinion here ---

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to Classifications

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

dataRow: Dict[str, str]


def create_temporal_ndjson_annotations(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function probably should also be renamed to create_temporal_ndjson_classifications or create_temporal_ndjson_classification_annotations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i think classification is better than annotation actually. Classification != Annotation, so we should use classification for this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@rishisurana-labelbox rishisurana-labelbox force-pushed the rishi/ptdt-3807/temporal-audio-support-sdk branch from 5311011 to 9afd82d Compare October 9, 2025 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants