-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat(macOS): Capture audio on macOS using Tap API #4209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(macOS): Capture audio on macOS using Tap API #4209
Conversation
|
Thank you for the PR!
This isn't a blocker, we just want to understand how much it was used when developing the PR. I've went ahead and approved the workflow run so the basic linting and whatnot can be taken care of before full reviews take place. Looks like there's an issue with the tests. I don't see an error, just kind of looks like it crashes out. https://github.com/LizardByte/Sunshine/actions/runs/17332988293/job/49216925169?pr=4209#step:8:3666 Linting:
We'll also need another way to compile the tests without the .mm file on non macOS platforms: https://github.com/LizardByte/Sunshine/actions/runs/17332988293/job/49216934161?pr=4209#step:11:1886 |
|
Hey @ReenigneArcher, thanks for kicking things off and for clarifying the AI usage policy! I’ll dig into those CI issues next. Got one question regarding the cross-platform test compilation issue: would you be okay with me updating I’ll push updates once I have fixes in place. And thanks again for the feedback! 🙂 |
Sounds fine to me and I also believe that's the cleanest and easiest approach. |
This comment was marked as resolved.
This comment was marked as resolved.
Bundle ReportChanges will increase total bundle size by 121 bytes (0.01%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: sunshine-esmAssets Changed:
Files in
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4209 +/- ##
==========================================
- Coverage 12.09% 12.09% -0.01%
==========================================
Files 87 87
Lines 17612 17613 +1
Branches 8097 8097
==========================================
Hits 2131 2131
- Misses 14579 14580 +1
Partials 902 902
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
Thanks for the PR. I have built this on my M1 Pro MBP and have been trying to get it to work, but haven't had much success so far. I am not able to get any sound to be captured and/or sent. So far I am only testing with the simplest use case of audio playing out of my MBP speakers. I edited the code to make the tap and aggregate non-private, so I could try to view the tap/aggregate with Apple's sample app I can see the tap, but not the aggregate. I'm not sure what's wrong. Can you detail your testing process? I do seem to be getting OK log entries and since I'm running from iTerm, all my permissions seem to be in order (iTerm has access to many things). I had to make the following changes to get it to build: plus a fix in our input.cpp that breaks the latest Xcode 16.4, I'm surprised if you didn't run into this one. I am running Xcode 16.4 clang-1700.0.13.5). |
|
Posting this here for reference in case it's needed for permissions of unit tests. We used to need something like this for macports, but it was never necessary for homebrew. Sunshine/.github/workflows/CI.yml Lines 841 to 898 in b662b8e
|
|
Hey @andygrundman, thanks for taking the time to test the PR. Sorry to hear it’s not working correctly. Here’s my setup on an M4 MBP and what I did. clang --version
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/binI made the changes in VS Code and followed the steps in mkdir build
cmake -B build -G Ninja -S .
ninja -C buildFor testing I did the following steps:
So at one point I'm running three streams simultaneously and the audio plays through the devices. My audio_sink = Steam Streaming Speakers
macos_system_wide_audio_tap = true
origin_web_ui_allowed = pc
stream_audio = true
wan_encryption_mode = 2With I’m familiar with the sample app! It might be the case that the aggregate device settings are probably still marked as private. You’ll need to flip those to Sunshine/src/platform/macos/av_audio.mm Line 604 in 6404705
Sunshine/src/platform/macos/av_audio.mm Line 649 in 6404705
Then the tap will show up in Apple's sample app's UI.
And the aggregate device shows up as well.
But even with those values flipped to |
|
Great, adding I worry that this is a nightmare for the average user. I think we'll probably have to ship a proper dmg package so that only the actual app gets the permission and not whatever terminal the user ran brew install from. But I'm getting ahead of myself... I will put in some other review notes I have, after I do some more testing. |
|
I completely agree that the user experience shouldn’t be tied to a specific terminal. At some point, you’ll likely want to create an app bundle. I've also added an additional property list key in the Sunshine/src_assets/macos/assets/Info.plist Lines 11 to 12 in 6404705
I will check for a way to properly check those required permissions because right now this is missing. By the way, I ran into the same issue with loading the assets. I initially worked around it by creating a symlink: ln -s /Users/<username>/Sunshine/build/assets/web /usr/local/assets/webHowever, I didn’t like having that lingering around so I updated the Sunshine/cmake/compile_definitions/unix.cmake Lines 1 to 11 in 6404705
|
a |
|
I spent a bit of time looking about making a proper app bundle with cmake. It's certainly doable, and mostly just a matter of refactoring the maze of all the include files. App bundle would give us easier permissions, asset storage, signing, much easier for users. The more I run the Mac version the more I realize the video side of things needs a lot of work too: host latency stat is missing, min/max framerate support, HDR doesn't work, frametime is uneven, it's using BGRA textures so probably using much more memory/cpu/gpu than it should be. It's one of the only apps that runs my MBP fan... Lots of fun stuff to work on I guess. What's your use case with Mac Sunshine, if you don't mind me asking? We should assume controller support will also be a lot of work. Apple's "Screen Sharing" app already covers the remote desktop use cases, and no one plays games on their Mac (yet). |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
andygrundman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is a well done patch, and is a good example of how to do an AI-assisted PR.
I originally was worried about the frequent use of manual release, e.g.
[audioInput release];
[audioOutput release];
but then I learned that ObjC built by cmake doesn't use the newer ARC memory management model that I was more familiar with from developing in Xcode. ARC dates from 2012 in OSX Lion if you can believe it, but since devs had to opt into using ARC, nobody ever did for this code. So, I guess that's a point for AI here. I would have probably not thought about this and been quite confused at all the memory leaks I was creating. Or more likely it would just result in a lot of compiler errors. (In -fobjc-arc mode, these kinds of release calls are compiler errors.)
docs/configuration.md
Outdated
| </tr> | ||
| </table> | ||
|
|
||
| ### macos_system_wide_audio_tap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this needs to be a setting? The main point of confusion with users will be the fact that you don't link your capture to an audio device, the way you do on Windows. If we just defaulted to this, is there a downside?
Just FYI, when I was first playing around with the Tap API I went about it using the
CATapDescription *tapDesc = [[CATapDescription alloc] initExcludingProcesses:toExclude
andDeviceUID:audioSinkUID
withStream:0];
version which used the default audio device. I never tried the system audio tap which I suppose works better. For example system tap captures audio even when the default audio device is muted, I think device capture would not capture audio in this case, but I haven't tested it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it as a setting to avoid breaking existing setups (users relying on BlackHole or similar). This way an update wouldn’t disrupt their workflow. That said, I'm happy to defer to you and the other maintainers on whether it makes sense to just default to the system tap.
Also, I’m not entirely sure initStereoGlobalTapButExcludeProcesses is the best long-term choice, since it enforces stereo and might complicate 5.1/7.1 setups? I don’t have a great way to test multichannel properly, so I’d appreciate your input here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without looking at the code super closely, I believe it should just use the Tap API if audio/virtual sink are unset. If those are set then use whatever they are set to. With this approach their setup will not change as they had to manually set blackhole or whatever.
src/platform/macos/av_audio.mm
Outdated
| * @param inClientData Client data containing AVAudioIOProcData structure | ||
| * @return OSStatus indicating success (noErr) or error code | ||
| */ | ||
| static OSStatus systemAudioIOProcWrapper(AudioObjectID inDevice, const AudioTimeStamp *inNow, const AudioBufferList *inInputData, const AudioTimeStamp *inInputTime, AudioBufferList *outOutputData, const AudioTimeStamp *inOutputTime, void *inClientData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this ObjC call will have any performance overhead? One of the main things the CoreAudio docs drill into you is that these callbacks that run at the highest priority on the audio thread should stay in pure C++ and avoid a lot of things that you wouldn't normally worry about: No malloc/free, no locks, no system calls, and I think no Objective-C method calls, because the ObjC runtime isn't realtime safe.
That said, maybe it's not something we need to worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point... My bad! Malloc definitely shouldn’t be in this callback. I went with the ObjC approach initially, but I agree that staying pure C/C++ here would be safer. I’ll adjust that.
| // produces linker errors for __isPlatformVersionAtLeast, so we have to use | ||
| // a different method. | ||
| #pragma clang diagnostic push | ||
| #pragma clang diagnostic ignored "-Wunguarded-availability-new" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to patch this section to get it to compile, due to AVCaptureDeviceTypeBuiltInMicrophone and AVCaptureDeviceTypeExternalUnknown being deprecated. It probably wouldn't be the end of the world if we simply required at least macOS 15 and dropped any of this legacy stuff completely.
-#pragma clang diagnostic push
-#pragma clang diagnostic ignored "-Wunguarded-availability-new"
- AVCaptureDeviceDiscoverySession *discoverySession = [AVCaptureDeviceDiscoverySession discoverySessionWithDeviceTypes:@[AVCaptureDeviceTypeBuiltInMicrophone, AVCaptureDeviceTypeExternalUnknown]
+ AVCaptureDeviceDiscoverySession *discoverySession = [AVCaptureDeviceDiscoverySession discoverySessionWithDeviceTypes:@[AVCaptureDeviceTypeMicrophone, AVCaptureDeviceTypeExternal]
mediaType:AVMediaTypeAudio
position:AVCaptureDevicePositionUnspecified];
NSArray *devices = discoverySession.devices;
BOOST_LOG(debug) << "Found "sv << [devices count] << " devices using discovery session"sv;
return devices;
-#pragma clang diagnostic pop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m fine with either direction, whether we keep backward compatibility or just require macOS 15+. I’ll follow whatever you and the team prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference about supporting 14. We just recently dropped support for 13 a little bit earlier than Apple because of missing c++23 features.
If we decide to drop support for 14, we can set that in the homebrew formula here:
Sunshine/packaging/sunshine.rb
Lines 40 to 42 in 705d763
| on_macos do | |
| depends_on xcode: ["15.3", :build] | |
| end |
depends_on macos: :sequoia
Also, we'll need to update the minimum version in the readme.
| }]; | ||
| BOOST_LOG(debug) << "Configured audio output with settings: "sv << sampleRate << "Hz, "sv << (int) channels << " channels, 32-bit float"sv; | ||
|
|
||
| dispatch_queue_attr_t qos = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_CONCURRENT, QOS_CLASS_USER_INITIATED, DISPATCH_QUEUE_PRIORITY_HIGH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not ever matter in practice, but the docs for AVCaptureAudioDataOutput say "A serial dispatch queue (DISPATCH_QUEUE_SERIAL) must be used to guarantee that audio samples will be delivered in order.".
src/platform/macos/av_audio.mm
Outdated
| // We'll provide a generous buffer and let it tell us what it actually used | ||
| UInt32 maxOutputFrames = inputFrames * 4; // Very generous for any upsampling scenario | ||
| UInt32 outputBytes = maxOutputFrames * clientChannels * sizeof(float); | ||
| float *outputBuffer = (float *) malloc(outputBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Here's a malloc in an IOProc.) But I'm not clear on how AudioConverter works with system taps. Since it's not tied to a device running at a particular sample rate, do you always create the tap at 48k? When do you have to do explicit resampling? It would be great if the OS just handled this automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, that malloc/free definitely shouldn’t be there in a real-time callback. I remember noticing the __attribute__((nonblocking)) in the CoreAudio docs but forgot to account for it here. I’ll fix that.
src/platform/macos/av_audio.mm
Outdated
| CATapDescription *tapDescription; | ||
| NSArray *excludeProcesses = @[]; | ||
|
|
||
| if (channels == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to support mono, since the way initStereo handles mono is to duplicate it to left/right, which is what you want.
Speaking of this, I am assuming that stereo is the only supported configuration here. So we should probably be setting the tapDescription's stereo mixdown setting to true. If you've tested multichannel, how does it work? Here is the main use case I often use:
Moonlight 5.1 -> Sunshine playback of a 5.1 FLAC file, 5.1 mkv movies, and obviously games. What audio device has to be used to support 5.1 playback like this, without a virtual multichannel driver like Blackhole?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the mono handling as you suggested.
In regards to multichannel, I should clarify this a little. I tested on my Android TV Moonlight client by toggling 5.1/7.1 and did get audio, but I’m starting to think it was probably just L+R playback (hard to tell for sure.
In a previous iteration of the code, enabling 5.1 or 7.1 caused crackling on the client, while stereo worked fine. With the changes in this PR, all settings seem to play without issues. On the host side, I tested by launching a game or playing audio through YouTube, but I’m not sure how to verify that all channels are actually coming through correctly. I imagine properly testing multichannel would require additional hardware on the Android TV client.
Since the tap is set up with initStereoGlobalTapButExcludeProcesses, the host always mixes down to stereo. I’m not sure what would happen if we switched to initExcludingProcesses instead and tried it that way... As you suggested, we’d probably need a virtual multichannel driver to tap into that device’s full stream.
|
|
||
| tapDescription.name = uniqueName; | ||
| tapDescription.UUID = uniqueUUID; | ||
| [tapDescription setPrivate:YES]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to use tapDescription.muteBehavior to support Moonlight's "Mute host" option. If Mute Host is set, use CATapMuted. The WhenTapped option seems dangerous if it would suddenly unmute, so I wouldn't use that.
@constant CATapUnmuted
Audio is captured by the tap and also sent to the audio hardware
@constant CATapMuted
Audio is captured by the tap but no audio is sent from the process to the audio hardware
@constant CATapMutedWhenTapped
Audio is captured by the tap and also sent to the audio hardware until the tap is read by another audio client.
For the duration of the read activity on the tap no audio is sent to the audio hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, I can hook up tapDescription.muteBehavior to support the “Mute host” option, as you suggested.
docs/configuration.md
Outdated
| @endcode</td> | ||
| </tr> | ||
| </table> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the top of this page there is the text "The name of the audio sink used for Audio Loopback. Sunshine can only access microphones on macOS due to system limitations. To stream system audio using Soundflower or BlackHole." We should remove that text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here:
Sunshine/packaging/sunshine.rb
Lines 153 to 154 in 705d763
Sunshine can only access microphones on macOS due to system limitations. To stream system audio use "Soundflower" or "BlackHole". Sunshine/docs/getting_started.md
Lines 376 to 378 in 705d763
Sunshine can only access microphones on macOS due to system limitations. To stream system audio use [Soundflower](https://github.com/mattingalls/Soundflower) or [BlackHole](https://github.com/ExistentialAudio/BlackHole). Sunshine/src_assets/common/assets/web/configs/tabs/AudioVideo.vue
Lines 37 to 40 in 705d763
<template #macos> <a href="https://github.com/mattingalls/Soundflower" target="_blank">Soundflower</a><br> <a href="https://github.com/ExistentialAudio/BlackHole" target="_blank">BlackHole</a>. </template>
Maybe we should keep a little blurb somewhere about how to use these two if they don't want to use the Tap API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove that.
| TEST_F(AVAudioTest, FindMicrophoneWithNilNameReturnsNil) { | ||
| #pragma clang diagnostic push | ||
| #pragma clang diagnostic ignored "-Wnonnull" | ||
| AVCaptureDevice *device = [AVAudio findMicrophone:nil]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these tests that check for nil input are crashing, because the first line in findMicrophone is BOOST_LOG(debug) << "Searching for microphone: "sv << [name UTF8String]; and you can't call UTF8String on nil. So there are probably a lot of additional checks for nil that need to go into those methods, or maybe the nil tests are a bit overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, I can fix that.
Yeah, I wondered about ARC too and did a quick search for Good to hear that lines up with your findings as well. |
|
d75fde0 to
db3d2df
Compare
|









Description
This PR adds system-wide audio tap support for macOS. The implementation introduces:
Additional changes
cmakefiles for compatibility with Homebrew-based setup.opensslandopusare found automatically. This replaces the need to run manuallncommands, but I’m not sure if this is the best long-term approach. Feedback welcome.cmake/compile_definitions/unix.cmaketo ensureSUNSHINE_ASSETS_DIRresolves correctly.src_assets/macos/assets/Info.plistto prepare for required macOS permission prompts.macos_system_wide_audio_tapconfig option to theaudio_tstruct.Testing
Notes
setupMicrophonefunctionality as it is also a viable option!macos_system_wide_audio_tapsetting.If the AI involvement is a blocker for merging, no worries. I can totally understand! This was also a learning project for me and I had a lot of fun building it.
Screenshot
Web UI – Audio/Video Configuration

New option for enabling system-wide audio recording on macOS. Disables the audio sink option when checked.
macOS Permission Prompt

System permission request when Sunshine first tries to access system audio:
System Settings – Screen & System Audio Recording

macOS privacy settings showing Sunshine/Terminal access to "System Audio Recording Only":
Issues Fixed or Closed
Roadmap Issues
Type of Change
Checklist
AI Usage