add efficient AV decoding #38

jon-barker · 2025-01-03T15:42:22Z

There is still tidying and documentation to be done. I am creating the draft PR to get initial feedback on design and big changes required.

voegtlel

Thanks for adding this code! A great addition! A few comments mainly on the integration

pyproject.toml

voegtlel · 2025-01-10T15:02:32Z

src/megatron/energon/flavors/webdataset/default_decoder_webdataset.py

+        video_num_frames: int = 64,
+        video_out_frame_size: tuple = (224, 224),


Should these be constant and not rather be set when fetching the frames in the user code? I think there should be a dynamic option as well.
E.g. Open-Sora group-batches videos into different sized outputs/frames.

yep, agreed and i like your suggested design below

voegtlel · 2025-01-14T15:26:01Z

src/megatron/energon/flavors/webdataset/fastseek/__init__.py

@@ -0,0 +1,4 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.


I'd suggest to move this to a more top-level package. Maybe under megatron.energon.video?

ack - will do

src/megatron/energon/flavors/webdataset/decode_av_frames.py

voegtlel · 2025-01-14T15:35:00Z

src/megatron/energon/flavors/webdataset/decode_av_frames.py

+
+DEFAULT_AUDIO_FRAME_SHIFT_MS = 10  # in milliseconds
+
+class AVDecoder:


Maybe we should have an alternative decoder as well, which returns the decoder itself, so the user can decide in user code (=encode_sample) which frames to read?

Like this:

# This function is to be registered as decoder def read_av_data(key: str, data: bytes): if key in ("mp3", ...): return AVData(data) # This class is now passed to the user's `encode_sample` function (i.e. the raw video # bytes are essentially passed through). This allows the user to decide on the # parameters on the fly (e.g. for open-sora). class AVData: def __init__(self, raw: bytes): ... def get_frames( self, audio_convert_to_melspec: bool = False, audio_clip_duration: int = 1, audio_num_clips: int = -1, audio_target_rate: int = 16000, video_decode_audio: bool = False, video_num_frames: int = 64, video_out_frame_size: tuple = (224, 224), ) -> AudioVideoData: ...

WDYT?

i like it - happy to make this change

voegtlel · 2025-01-14T15:42:33Z

src/megatron/energon/flavors/webdataset/decode_av_frames.py

+            )
+        return None
+
+def waveform2melspec(waveform, sample_rate, num_mel_bins, target_length):


I feel, the functions below may have their own file, and also reside in the fastseek package?

yes. i'd like to see if i can get rid of the torchaudio dependency here too. i'll revise and relocate this code

voegtlel · 2025-01-14T15:43:07Z

src/megatron/energon/flavors/webdataset/decode_av_frames.py

+class AVDecoder:
+    def __init__(
+            self,
+            audio_convert_to_melspec,


Generally, we have all parameters statically typed. Also all class variables are typically typed.

ack - will fix

Release 5.2.0

voegtlel requested changes Jan 14, 2025

View reviewed changes

voegtlel reviewed Jan 14, 2025

View reviewed changes

voegtlel force-pushed the develop branch from 1898ff4 to 4bccc3f Compare February 7, 2025 09:55

voegtlel and others added 14 commits February 19, 2025 10:53

Merge pull request NVIDIA#73 from NVIDIA/develop

a8d4894

Release 5.2.0

WIP: integrate fastseek

fd1b67b

add video decoding tests

73013d2

tweak video decode test

4bccd8d

WIP: add audio decode with tests

a2a6366

av tests

4428337

debugging audio

7f8f3a8

remove poorly planned audio test

a8408c2

uncomment video test

c83f8e2

add audio resampling and spectrogram conversion

3394f77

WIP: exposing av decode options through energon

d5bd6bf

expose av decode args through energon api

1125c13

support decoding audio clips from a video

4c35cae

small updates

263d231

jon-barker force-pushed the jbarker/efficient_video branch from 824af1d to 263d231 Compare February 26, 2025 21:53

Jon Barker added 4 commits February 26, 2025 14:15

remove melspec functionality. address MR review comments.

0200d15

make av decode dependencies optional and remove unecessary imports

87a8a76

typo

98b5fbe

wav support

d4c1513

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add efficient AV decoding #38

add efficient AV decoding #38

jon-barker commented Jan 3, 2025

voegtlel left a comment

voegtlel Jan 10, 2025

jon-barker Feb 24, 2025

voegtlel Jan 14, 2025

jon-barker Feb 24, 2025

voegtlel Jan 14, 2025

jon-barker Feb 24, 2025

voegtlel Jan 14, 2025

jon-barker Feb 24, 2025

voegtlel Jan 14, 2025

jon-barker Feb 24, 2025

		video_num_frames: int = 64,
		video_out_frame_size: tuple = (224, 224),


		DEFAULT_AUDIO_FRAME_SHIFT_MS = 10 # in milliseconds

		class AVDecoder:

add efficient AV decoding #38

Are you sure you want to change the base?

add efficient AV decoding #38

Conversation

jon-barker commented Jan 3, 2025

voegtlel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment