Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Add many audio sources (including voice) #5870

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from
Open

Conversation

rom1v
Copy link
Collaborator

@rom1v rom1v commented Feb 22, 2025

The existing audio sources were:

  • output (default): forwards the whole audio output, and disables playback on the device (mapped to REMOTE_SUBMIX).
  • playback: captures the audio playback (Android apps can opt-out, so the whole output is not necessarily captured).
  • mic: captures the microphone (mapped to MIC).

This PR adds:

  • mic-unprocessed: captures the microphone unprocessed (raw) sound (mapped to UNPROCESSED).
  • mic-camcorder: captures the microphone tuned for video recording, with the same orientation as the camera if available (mapped to CAMCORDER).
  • mic-voice-recognition: captures the microphone tuned for voice recognition (mapped to VOICE_RECOGNITION).
  • mic-voice-communication: captures the microphone tuned for voice communications (it will for instance take advantage of echo cancellation or automatic gain control if available) (mapped to VOICE_COMMUNICATION).
  • voice-call: captures voice call (mapped to VOICE_CALL).
  • voice-call-uplink: captures voice call uplink only (mapped to VOICE_UPLINK).
  • voice-call-downlink: captures voice call downlink only (mapped to VOICE_DOWNLINK).
  • voice-performance: captures audio meant to be processed for live performance (karaoke), includes both the microphone and the device playback (mapped to VOICE_PERFORMANCE).

Discontinuities

The existing audio sources always produce a continuous audio stream. A major issue is that some new audio sources (like the "voice call" source) do not produce packets on silence (they only capture during a voice call).

The audio regulator (the component responsible to maintain a constant latency) assumed that the input audio stream was continuous. In this PR, it now detects discontinuities based on the input PTS (and adjusts its behavior). This only works correctly if the input PTS are "correct".

Another major problem is that, even if the capture timestamps are correct, some encoders (OPUS) rewrite the PTS based on the number of samples (ignoring the input PTS). As a consequence, when encoding in OPUS, the timings are broken: they represent a continuous audio stream where the silences are removed. This breaks the discontinuity detection in the audio regulator (we could work around the problem by relying on the current recv date, since the real time playback itself does not depend on PTS). But the most important problem is that it breaks recording timings. For example:

scrcpy --audio-source=voice-call --record=file.mp4

If the voice call does not start immediately, the audio will not be played at the correct date.

With the AAC encoder, it works (the encoder on the device does not rewrite the PTS based only on the number of samples):

scrcpy --audio-source=voice-call --record=file.mp4 --audio-codec=aac

This PR is in draft due to this unsolved issue.


Aims to fix #5670 and #5412.

Only enable them if SC_AUDIO_REGULATOR_DEBUG is set, as they may spam
the output.
Report the number of silence samples inserted due to underflow every
second, along with the other metrics.
The audio regulator assumed a continuous audio stream. But some audio
sources (like the "voice call" audio source) do not produce any packets
on silence, breaking this assumption.

Use PTS to detect such discontinuities.

TODO: if PTS values are broken, the detection is also broken.
Store the target audio source integer (one of the constants in
android.media.MediaRecorder.AudioSource) in the AudioSource enum (or -1
if not relevant).

This will simplify adding new audio sources.
@rom1v rom1v changed the base branch from master to dev February 22, 2025 12:00
@Victor239
Copy link

Can there also be an option to capture no sound? When using multiple virtual display windows and playing audio it usually plays on all windows currently with no way disable it except through the OS sound settings.

@rom1v
Copy link
Collaborator Author

rom1v commented Feb 25, 2025

Can there also be an option to capture no sound?

https://github.com/Genymobile/scrcpy/blob/master/doc/audio.md#no-audio

rom1v added 2 commits March 2, 2025 17:17
The OPUS encoder on Android rewrites the PTS so that it exactly matches
the number of samples.

As a consequence:
 - clock drift is not compensated
 - hard silences are ignored

To fix this behavior, as a best effort, recreate the PTS based on the
current time (after encoding) and the packet duration.
@rom1v
Copy link
Collaborator Author

rom1v commented Mar 2, 2025

Another major problem is that, even if the capture timestamps are correct, some encoders (OPUS) rewrite the PTS based on the number of samples (ignoring the input PTS). As a consequence, when encoding in OPUS, the timings are broken: they represent a continuous audio stream where the silences are removed. This breaks the discontinuity detection in the audio regulator (we could work around the problem by relying on the current recv date, since the real time playback itself does not depend on PTS). But the most important problem is that it breaks recording timings.

This PR is in draft due to this unsolved issue.

Should be fixed by commit Fix PTS produced by the default OPUS encoder on this PR (the SHA1 will change on rebase, but currently it's 63d848f).

Please review/test/check.

@LaptopDev
Copy link

LaptopDev commented Mar 3, 2025

ref So because VOICE_UPLINK restricts 3rd party apps, microphone source cannot be passed from computer to phone during calls?

@yNEX
Copy link

yNEX commented Mar 5, 2025

I tested the changes from this PR using a private fork and built the project by using the GitHub Action. For my testing scenario, I received a WhatsApp call from a second phone. I tried both the --audio-source=voice-call-downlink option and voice-call-uplink and in both cases, the audio was transferred regardless of which phone was muted.

Additionally, with the regular --audio-source=playback option, the audio is no longer played back on the device. Is it possible to extend this behavior to voice calls as well?

I am using a Pixel 8 Pro (Android 15) and the Windows Client

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 5, 2025

I tried both the --audio-source=voice-call-downlink option and voice-call-uplink and in both cases, the audio was transferred

👍 Thank you for the test.

Additionally, with the regular --audio-source=playback option, the audio is no longer played back on the device. Is it possible to extend this behavior to voice calls as well?

The playback audio source uses a specific API, where we can request to duplicate audio or not (--audio-dup). For the others, we have no control (Android determines the behavior).

@yNEX
Copy link

yNEX commented Mar 5, 2025

Thanks for the quick response! 👌🏼

My idea was to use scrcpy to transfer both game audio and voice chat from Call of Duty Mobile to my PC for streaming with OBS. While everything works fine for the most part, I’m encountering an issue with voice call audio. When headphones are connected directly to the phone, the game sound and voice chat are bundled together. However, since I’m using the headphones on my PC, the audio streams remain separated on the phone.

Do you have any suggestions for this use case? Unfortunately, a capture card isn’t an option as it reduces the refresh rate from 120 Hz to 60 Hz. If it’s more convenient, we could discuss this privately to avoid cluttering the PR comment section.

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 5, 2025

When headphones are connected directly to the phone, the game sound and voice chat are bundled together. However, since I’m using the headphones on my PC, the audio streams remain separated on the phone.

See #4084 #4087. Scrcpy has no control over this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] option to choose unprocessed microphone output of phone (and other processing options)
4 participants