Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio passthrough #400

Merged
merged 13 commits into from
Jan 31, 2025
Merged

Audio passthrough #400

merged 13 commits into from
Jan 31, 2025

Conversation

j0sh
Copy link
Contributor

@j0sh j0sh commented Jan 24, 2025

Passes through any audio in the input to the output.

Notes:

  • Had to switch to PyAV rather than ffmpeg for better access to timestamps. This resulted in a bunch of cleanup for supporting code that is no longer needed (but also a lot of new additions)
  • We also had to bump up the PyAV version in the Dockerfiles since it was relying on an ancient version of libavcodec/libavformat; hopefully this is not a problem for whatever other dependencies are using PyAV. We need a recent version to able to support Opus inputs.
  • If no audio is present in the input, no audio will be in the output.
  • Likewise, if no video if present in the input, then no video in the output ... we will just pass through the audio 🤷
  • AAC is the output codec. For WebRTC delivery, the audio will need to be transcoded to something like Opus or G711.
  • This downmixes to mono regardless of the input layout / channels (since I suspect that most inference will output mono) but it is easy to change that
  • There is no audio pipeline wired in - not even a noop pipeline. For any audio frames we pop from the input queue, we push directly into the output queue.
  • The output queue is currently unbounded. This is a pre-existing issue and not a new one in this PR; we can fix it later, however we might want to prioritize audio rather than video when dropping.
  • This PR does not affect event metrics (or at least, it tries not to). The existing event metrics are video-only. We probably should define and track some other metrics for audio, but that can be done later.
  • Only tested with trickle protocol / go-livepeer e2e

@j0sh j0sh force-pushed the ja/audio-passthrough branch from 8a0f3fd to 05b7ab3 Compare January 28, 2025 15:20
output_image = pipeline.process_frame(input_frame.image)
output_frame = VideoOutput(input_frame.replace_image(output_image))
self.output_queue.put(output_frame)
elif isinstance(input_frame, AudioFrame):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat

@j0sh j0sh marked this pull request as ready for review January 30, 2025 20:41
return self.frame.time_base

class AudioOutput(OutputFrame):
frames: List[AudioFrame]
Copy link
Collaborator

@varshith15 varshith15 Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a list any specific reason? comfystream does the list handling, so this might not be needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - and Comfy should output the list when it's done processing, to minimize IPC and pressure on the queue. Since we know exactly what needs to be sent, send it all at once.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but comfystream has a list already, you call a function to get the next frame https://github.com/varshith15/comfystream/blob/af132bedd665027fa02c20601b98e0c229e54f51/server/pipeline.py#L74

we want it to be frame by frame in audio as well, otherwise we gotta maintain all the frame instances that were sent into the pipeline to assign processed data back into the the frames(ref line no 150 in runner/app/live/streamer/process.py)

it's just simpler to have it as a single frame because a list is maintained inside the comfystream already

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varshith15 maybe you can handle this when you do the ai-runner integration and we leave as-is for now since first goal is to get passthru working?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, makes sense, can pick this up later then integrating comfystream package


class TrickleProtocol(StreamProtocol):
def __init__(self, subscribe_url: str, publish_url: str, control_url: Optional[str] = None, events_url: Optional[str] = None):
self.subscribe_url = subscribe_url
self.publish_url = publish_url
self.control_url = control_url
self.events_url = events_url
self.subscribe_queue = queue.Queue[bytearray]()
self.publish_queue = queue.Queue[bytearray]()
self.subscribe_queue = queue.Queue[InputFrame]()
Copy link
Collaborator

@varshith15 varshith15 Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure; original code was not asyncio and this PR doesn't change anything there

--extra-cflags="-I/ffmpeg_build/include -I/usr/local/cuda/include" \
--extra-ldflags="-L/ffmpeg_build/lib -L/usr/local/cuda/lib64" \
--extra-libs="-lpthread -lm" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is removing libpthread expected here? I'm assuming various parts of ffmpeg rely on it to run efficiently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it was there to support static linking; I think there was a bug with the pkg-config declaration since that should also be pulling in pthreads.

It is still pulled in dynamically -

$ docker run -it --entrypoint /bin/bash livepeer/ai-runner:live-app-noop
root@733b39a972a0:/app# ldd /usr/local/bin/ffmpeg 
        linux-vdso.so.1 (0x00007ffd429f7000)
        libavdevice.so.61 => /usr/local/lib/libavdevice.so.61 (0x00007ff488af1000)
        libavfilter.so.10 => /usr/local/lib/libavfilter.so.10 (0x00007ff488660000)
        libavformat.so.61 => /usr/local/lib/libavformat.so.61 (0x00007ff4883cf000)
        libavcodec.so.61 => /usr/local/lib/libavcodec.so.61 (0x00007ff487058000)
        libpostproc.so.58 => /usr/local/lib/libpostproc.so.58 (0x00007ff487045000)
        libswresample.so.5 => /usr/local/lib/libswresample.so.5 (0x00007ff487024000)
        libswscale.so.8 => /usr/local/lib/libswscale.so.8 (0x00007ff486f8e000)
        libavutil.so.59 => /usr/local/lib/libavutil.so.59 (0x00007ff485e89000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff485da2000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff485b7a000)
        libxcb.so.1 => /lib/x86_64-linux-gnu/libxcb.so.1 (0x00007ff485b50000)
        libnppig.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libnppig.so.12 (0x00007ff483400000)
        libnppicc.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so.12 (0x00007ff482a00000)
        libnppidei.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so.12 (0x00007ff481c00000)
        libnppif.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libnppif.so.12 (0x00007ff47be00000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff485b32000)
        libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007ff485b1f000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007ff4833d5000)
        libfdk-aac.so.2 => /lib/x86_64-linux-gnu/libfdk-aac.so.2 (0x00007ff4828cf000)
        libopus.so.0 => /lib/x86_64-linux-gnu/libopus.so.0 (0x00007ff483377000)
        libx264.so.163 => /lib/x86_64-linux-gnu/libx264.so.163 (0x00007ff47bb3f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff488b6b000)
        libXau.so.6 => /lib/x86_64-linux-gnu/libXau.so.6 (0x00007ff483371000)
        libXdmcp.so.6 => /lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007ff483369000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff483364000)
        libnppc.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libnppc.so.12 (0x00007ff47b600000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff48335f000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff48335a000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff48333a000)
        libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007ff483322000)
        libmd.so.0 => /lib/x86_64-linux-gnu/libmd.so.0 (0x00007ff483315000)
root@733b39a972a0:/app# 

if audio_stream and packet.stream == audio_stream:
# Decode audio frames
for aframe in packet.decode():
if aframe.pts is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions would a decoded audio packet have no PTS set?


elif video_stream and packet.stream == video_stream:
# Decode video frames
for frame in packet.decode():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do sw or hw decode behind the scenes now that ffmpeg is not in this flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Software only for now. The old ffmpeg code was also software-only (there was logic for GPU but we never enabled it)

We can enable GPU here eventually but want to keep this like-for-like to avoid introducing new issues, IIRC we've seen problems with transcoding + AI workloads

loop.call_soon_threadsafe(do_schedule)

# hold tasks since `loop.create_task` is a weak reference that gets GC'd
# TODO use asyncio.TaskGroup once all pipelines are on Python 3.11+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not switch to python 3.11 right now? Is there a limitation? The logic here looks alright but it feels like using TaskGroup would simplify it a bit to avoid manual tracking of tasks. We can merge liek this for now and re-test but perhaps should consider revisiting before launch if we see issues? wdyt?

cc @victorges as well for comments on this since I'm still relearning Pythonic concurrency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are dependency limitations in some pipelines that preclude us from using newer Python although I am not sure what the minimum version would be - @victorges might know better. Didn't want to introduce the risk of a Python upgrade and pipeline validation alongside this PR which is already invasive enough as-is

@emranemran
Copy link
Contributor

Left a few generic comments on things we may need follow-up on - but I think it looks good to merge to staging and get some testing going before spending too much time with local testing and iteration. wdyt?

@j0sh j0sh merged commit 35fca9c into livepeer:main Jan 31, 2025
6 of 7 checks passed
@j0sh j0sh deleted the ja/audio-passthrough branch January 31, 2025 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants