Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use vt for manually decoding frames. Fixes #533 #535

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

felipejfc
Copy link
Contributor

Two main changes

1 - Use VideoToolbox to manually decode each frame instead of submitting it directly to AVSampleBufferDisplayLayer; I'm not proud of this change, but it was needed to fix #533. There may be some way to fix the issue without needing this change, but I still didn't manage to do it.

2 - Latency and smoothness changes
2.1 - Use Direct Submit in VideoDecodeRenderer (reduces latency)
2.2 - Use PTS information correctly per frame instead of using the DisplayImmediately flag in each sampleBuffer. Together with the change above, I was able to replicate smooth low latency stream as I get into the Nvidia Shield. I think using the flag messed with frame time and caused jittering.

Right now, I'm breaking the "Smooth Stream" option that we added some months ago, but wanted to create the PR either way for us to discuss options @cgutman

use pts from moonlight server to schedule frame display

use decompression callback unused frameRef field to propagate frameType information

Use obj-c cb for decode session

Revert to direct decode, use PTS correctly
Copy link
Member

@cgutman cgutman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are looking pretty good.

I think this has the potential to improve the frame pacing option too. Now that we have access to the decoded samples, we can keep a queue of those and submit those in our display link callback. Since we were queuing compressed video samples before, there was the potential that we could miss a frame deadline if the decoder couldn't finish decoding that frame by the time it was due to be displayed.

@@ -378,4 +373,65 @@ - (int)submitDecodeBuffer:(unsigned char *)data length:(int)length bufferType:(i
return DR_OK;
}

- (OSStatus) decodeFrameWithSampleBuffer:(CMSampleBufferRef)sampleBuffer frameType:(int)frameType{
VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does async decompression result in improved performance vs synchronous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I didn't compare sync/async here because I couldn't find a reliable way to measure performance other than my gut feeling. Any ideas?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every frame is decompressed asynchronously, this is the correct setting to increase speed.

Limelight/Stream/VideoDecoderRenderer.m Outdated Show resolved Hide resolved
Limelight/Stream/VideoDecoderRenderer.m Outdated Show resolved Hide resolved

OSStatus res = CMVideoFormatDescriptionCreateForImageBuffer(kCFAllocatorDefault, imageBuffer, &formatDescriptionRef);
if (res != noErr){
NSLog(@"Failed to create video format description from imageBuffer");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the NSLog() calls to our Log() function instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also call LiRequestIdrFrame() here?

Limelight/Stream/VideoDecoderRenderer.m Outdated Show resolved Hide resolved
Limelight/Stream/VideoDecoderRenderer.m Outdated Show resolved Hide resolved
} else {
// I-frame
CFDictionarySetValue(dict, kCMSampleAttachmentKey_NotSync, kCFBooleanFalse);
CFDictionarySetValue(dict, kCMSampleAttachmentKey_DependsOnOthers, kCFBooleanFalse);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attributes should be set on the H.264/HEVC CMSampleBuffer that we pass to the VTDecompressionSession rather than the CMSampleBuffer we pass to AVBufferSampleDisplayLayer (which is now just raw YUV data).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes total sense! will change it

Limelight/Stream/VideoDecoderRenderer.m Outdated Show resolved Hide resolved
Limelight/Stream/VideoDecoderRenderer.m Outdated Show resolved Hide resolved
@felipejfc
Copy link
Contributor Author

felipejfc commented Nov 28, 2022

Plot Twist: As part of tackling the improvements you suggested @cgutman, I hit the reason for the original issue #533. Still unsure about the root cause, but it's these lines:

https://github.com/moonlight-stream/moonlight-ios/blob/master/Limelight/Stream/VideoDecoderRenderer.m#L346-L360

I ran some tests, and we don't even need to set any value in the dict; only getting it will cause the decoder to go nuts:

   CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES);
    CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0);

This is enough to make the decoder fail when the sample buffers contain HDR data. I imagine it must be due to some OS bug. If I remove the whole block, the decoder will work just fine.
Interestingly enough, these lines will also break the manual VTDecompression flow (error kVTVideoDecoderMalfunctionErr | -12911); this is how I figured I should try commenting on them in the original solution.

Given that:

  • Do you think the solution to decouple decoding still makes sense? Or should we move forward with removing these lines only? Honestly, I'm not sure if they make any difference. From my tests, I see none when I change them.
  • I can send another PR with only this change and using the PTS information; Since we can't get the reference to the dict this will be needed as we can't set DisplayImmediately flag. WDYT?

@cgutman
Copy link
Member

cgutman commented Nov 30, 2022

Do you think the solution to decouple decoding still makes sense? Or should we move forward with removing these lines only? Honestly, I'm not sure if they make any difference. From my tests, I see none when I change them.

Removing those lines should be fine.

I can send another PR with only this change and using the PTS information; Since we can't get the reference to the dict this will be needed as we can't set DisplayImmediately flag. WDYT?

Yep, let's do that to fix #533 ASAP and we can see if using a VTDecompressionSession improves things further vs the current pure AVSampleBufferDisplayLayer solution.

Do you see a frame pacing regression using the PTS info with the pacing option enabled? If so, we can just use this solution for HDR streaming only for now.

@felipejfc
Copy link
Contributor Author

@cgutman as part of the changes, I wanted to test different ways of decoding and rendering, using VT to manually decode and update a CALayer with the resulting image, continue to pass the encoded buffer directly to AVSampleBufferDisplayLayer, and then measure the latency of each approach.
Any ideas on how I could benchmark these solutions reliably?

@cgutman
Copy link
Member

cgutman commented Dec 2, 2022

I suppose you could use a phone in slow motion mode.

For now though, let's try to get HDR on the new Apple TV, then we can fine tune things later.

Can you send your basic PR with just the HDR fix?

@felipejfc
Copy link
Contributor Author

@cgutman I isolated the changes to fix HDR here #536. Will keep on with researching different methods of drawing to improve latency -- in the newer ATV 4K it's still more noticeable than M1 Macs and Nvidia Shield

@Starlank
Copy link
Contributor

Starlank commented Dec 2, 2022

Going to build and test the lowest latency option on my Apple TV 4K 2021 with MoCA setup and report back!

@Starlank
Copy link
Contributor

Starlank commented Dec 2, 2022

@felipejfc do the latency improvements only apply to the 2022 Apple TV 4K?

@felipejfc
Copy link
Contributor Author

@Starlank the changes here and in the other PR should not improve latency; but should improve stream "smoothness" using the low latency pacing mode. I'm currently studying latency improvements locally.


- (void) setupDecompressionSession {
if (decompressionSession != NULL){
VTDecompressionSessionInvalidate(decompressionSession);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is also necessary to do the following:
VTDecompressionSessionWaitForAsynchronousFrames(decompressionSession);

}

// Enqueue the next frame
[self->displayLayer enqueueSampleBuffer:sampleBuffer];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a flush before the [self->displayLayer enqueueSampleBuffer:sampleBuffer] function.

if (![self->displayLayer isReadyForMoreMediaData]) {
        [self->displayLayer flush];
}

Sometimes it happens that not all data is played, and the buffer fills up, so the playback stops.

}

CMSampleBufferRef sampleBuffer;
CMSampleTimingInfo sampleTiming = {kCMTimeInvalid, presentationTimestamp, presentationDuration};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using CACurrentMediaTime() for timing info.
In this way, the frames will be played immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression if I set it to display immediately, more jittering is generated, maybe because presentationDuration gets messed up?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the duration if you know the fps.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but try it without setting the duration, I don't notice the jittering in another project.

decompressionSession = nil;
}

int status = VTDecompressionSessionCreate(kCFAllocatorDefault,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For initialize VTDecompressionSession need set parameters.
If I understand it correctly, the sender can send data from various sources.
Therefore, it is necessary to prepare for the fact that the data can be of various types.
For example:

        let imageBufferAttributes = [
            //kCVPixelBufferPixelFormatTypeKey: NSNumber(value: kCVPixelFormatType_......), /// if needed
            kCVPixelBufferIOSurfacePropertiesKey: [:] as AnyObject, 
            kCVPixelBufferOpenGLESCompatibilityKey: NSNumber(booleanLiteral: true), 
            kCVPixelBufferMetalCompatibilityKey: NSNumber(booleanLiteral: true), 
            kCVPixelBufferOpenGLCompatibilityKey: NSNumber(booleanLiteral: true) 
        ]

@Alanko5
Copy link

Alanko5 commented Dec 16, 2022

If you like SDR change to HDR:

let pixelTransferProperties = [kVTPixelTransferPropertyKey_DestinationColorPrimaries: kCVImageBufferColorPrimaries_ITU_R_2020,
                                           kVTPixelTransferPropertyKey_DestinationTransferFunction: kCVImageBufferTransferFunction_SMPTE_ST_2084_PQ,
                                           kVTPixelTransferPropertyKey_DestinationYCbCrMatrix: kCVImageBufferYCbCrMatrix_ITU_R_2020]

VTSessionSetProperty(decompressionSession,
                                 key: kVTDecompressionPropertyKey_PixelTransferProperties,
                                 value: pixelTransferProperties as CFDictionary)

Do not forget that on tvOS it is necessary to switch the TV to HDR mode.

@felipejfc
Copy link
Contributor Author

Thanks for the review @Alanko5. I have doubts regarding the manual decompression approach though, as I wasn't able to reduce video latency.
The way that reduced it the most was using kCVPixelBufferIOSurfacePropertiesKey property so that the image received in the decompression callback has a backing IOSurface and setting the displayLayer contents directly (ditching the SampleBufferDisplayLayer basically)
Btw given how much latency the AppleTV, even newest model has, when compared to ipads or iphones I think that it's some hardware related latency between the ATV and the display (monitor/TV)

@felipejfc
Copy link
Contributor Author

If you like SDR change to HDR:

let pixelTransferProperties = [kVTPixelTransferPropertyKey_DestinationColorPrimaries: kCVImageBufferColorPrimaries_ITU_R_2020,
                                           kVTPixelTransferPropertyKey_DestinationTransferFunction: kCVImageBufferTransferFunction_SMPTE_ST_2084_PQ,
                                           kVTPixelTransferPropertyKey_DestinationYCbCrMatrix: kCVImageBufferYCbCrMatrix_ITU_R_2020]

VTSessionSetProperty(decompressionSession,
                                 key: kVTDecompressionPropertyKey_PixelTransferProperties,
                                 value: pixelTransferProperties as CFDictionary)

Do not forget that on tvOS it is necessary to switch the TV to HDR mode.

Is this SDR->HDR mapping?

@Alanko5
Copy link

Alanko5 commented Dec 16, 2022

If you like SDR change to HDR:

let pixelTransferProperties = [kVTPixelTransferPropertyKey_DestinationColorPrimaries: kCVImageBufferColorPrimaries_ITU_R_2020,
                                           kVTPixelTransferPropertyKey_DestinationTransferFunction: kCVImageBufferTransferFunction_SMPTE_ST_2084_PQ,
                                           kVTPixelTransferPropertyKey_DestinationYCbCrMatrix: kCVImageBufferYCbCrMatrix_ITU_R_2020]

VTSessionSetProperty(decompressionSession,
                                 key: kVTDecompressionPropertyKey_PixelTransferProperties,
                                 value: pixelTransferProperties as CFDictionary)

Do not forget that on tvOS it is necessary to switch the TV to HDR mode.

Is this SDR->HDR mapping?

Yes, apple mentions it somewhere in the documentation.
It does not generate an HDR image, but it improves SDR.

@Alanko5
Copy link

Alanko5 commented Dec 16, 2022

Thanks for the review @Alanko5. I have doubts regarding the manual decompression approach though, as I wasn't able to reduce video latency.
The way that reduced it the most was using kCVPixelBufferIOSurfacePropertiesKey property so that the image received in the decompression callback has a backing IOSurface and setting the displayLayer contents directly (ditching the SampleBufferDisplayLayer basically)
Btw given how much latency the AppleTV, even newest model has, when compared to ipads or iphones I think that it's some hardware related latency between the ATV and the display (monitor/TV)

I don't think it's caused by HW.
tvOS is a different system than iOS.
I think that it is enough to find some setting that will only be enabled.
But I may be wrong.

@Alanko5
Copy link

Alanko5 commented Dec 16, 2022

Thanks for the review @Alanko5. I have doubts regarding the manual decompression approach though, as I wasn't able to reduce video latency.
The way that reduced it the most was using kCVPixelBufferIOSurfacePropertiesKey property so that the image received in the decompression callback has a backing IOSurface and setting the displayLayer contents directly (ditching the SampleBufferDisplayLayer basically)
Btw given how much latency the AppleTV, even newest model has, when compared to ipads or iphones I think that it's some hardware related latency between the ATV and the display (monitor/TV)

What latency are we talking about?
Now I tried to measure the decoding time.
H265 decoding took 0.001sec.
Where do you see this delay?
Maybe I don't fully understand the problem.

@felipejfc
Copy link
Contributor Author

Thanks for the review @Alanko5. I have doubts regarding the manual decompression approach though, as I wasn't able to reduce video latency.
The way that reduced it the most was using kCVPixelBufferIOSurfacePropertiesKey property so that the image received in the decompression callback has a backing IOSurface and setting the displayLayer contents directly (ditching the SampleBufferDisplayLayer basically)
Btw given how much latency the AppleTV, even newest model has, when compared to ipads or iphones I think that it's some hardware related latency between the ATV and the display (monitor/TV)

What latency are we talking about? Now I tried to measure the decoding time. H265 decoding took 0.001sec. Where do you see this delay? Maybe I don't fully understand the problem.

There's streaming delay witn ATV4K when compared to streaming with an iphone/ipad or nvidia shield. I compared them using a stopwatch application and slow-mo iPhone camera to compare PC screen time with streaming screen time

@Alanko5
Copy link

Alanko5 commented Dec 16, 2022

I understand.
Measure how long it takes you to decode.
According to my measurements, it is 0.001sec.
If you think the delay is causing the decoding, switch to H264.
There, the delay is even smaller.

In my opinion, the timing will help you solve the problem.

Do you not use WiFi when measuring? :-)

@felipejfc
Copy link
Contributor Author

felipejfc commented Dec 16, 2022

I understand. Measure how long it takes you to decode. According to my measurements, it is 0.001sec. If you think the delay is causing the decoding, switch to H264. There, the delay is even smaller.

In my opinion, the timing will help you solve the problem.

Do you not use WiFi when measuring? :-)

For 4k HEVC I think I was getting 8ms time to decode each frame. 10~11 ms total time to receive the whole frame, pack it together and decode

@Alanko5
Copy link

Alanko5 commented Dec 16, 2022

Did you measure the decompression time of the Key and non-Key frames?
Can you set the server to send fewer keyframes? (for example one per two seconds)

What is the total delay of the image that you measured with the camera?

What version of apple tv do you have?

How do you create a VTDecompressionSession?
I mean, what parameters are you setting?

@felipejfc
Copy link
Contributor Author

Code is in this branch https://github.com/felipejfc/moonlight-ios/tree/ds_queue_surface

I have the latest 4K Apple TV(2022) with the iPhone 12 pro processor.

What is the total delay of the image that you measured with the camera?
The streamed image would be always 25~50 hundredths behind the original image; when testing with m1 MacBook, or nvidia shield, most of the time the images would be in sync.

Did you measure the decompression time of the Key and non-Key frames?
I measured all frames and they all took this same amnt of time.

Can you set the server to send fewer keyframes?

Pretty sure gamestream won't allow me to do it

@Alanko5
Copy link

Alanko5 commented Dec 16, 2022

According to what you write, the problem is not in decoding.
I think that by improving the decoding you can gain a maximum of 5ms.
The first thing I would look for is a network or rendering delay.
Because a delay of 250~500ms is huge!

Well, you can try as follows:

It is necessary that you set this value (As I wrote above):
kCVPixelBufferMetalCompatibilityKey

Your destinationImageBufferAttributes:

NSDictionary *pixelAttributes = @{
        (id)kCVPixelBufferMetalCompatibilityKey : (id)kCFBooleanTrue,
        (id)kCVPixelBufferIOSurfaceCoreAnimationCompatibilityKey : (id)kCFBooleanTrue,
        (id)kCVPixelBufferIOSurfacePropertiesKey : @{},
    };

I think that during rendering it would help if the layer could use Metal.

NSDictionary *videoDecoderSpec = @{
         (id) kCMFormatDescriptionExtension_FullRangeVideo : FORMAT_DESC_FullRangeVideo,
         (id) kCVImageBufferChromaLocationBottomFieldKey: kCVImageBufferChromaLocation_Left,
         (id) kCVImageBufferChromaLocationTopFieldKey: kCVImageBufferChromaLocation_Left,
         (id) kCVImageBufferPixelAspectRatioKey: FORMAT_DESC_AspectRatio,
         (id) kCVImageBufferColorPrimariesKey: FORMAT_DESC_ColorPrimaries,
         (id) kCVImageBufferTransferFunctionKey: FORMAT_DESC_TransferFunction,
         (id) kCVImageBufferYCbCrMatrixKey: FORMAT_DESC_YCbCrMatrix
};

@felipejfc
Copy link
Contributor Author

According to what you write, the problem is not in decoding. I think that by improving the decoding you can gain a maximum of 5ms. The first thing I would look for is a network or rendering delay. Because a delay of 250~500ms is huge!

Well, you can try as follows:

It is necessary that you set this value (As I wrote above): kCVPixelBufferMetalCompatibilityKey

Your destinationImageBufferAttributes:

NSDictionary *pixelAttributes = @{
        (id)kCVPixelBufferMetalCompatibilityKey : (id)kCFBooleanTrue,
        (id)kCVPixelBufferIOSurfaceCoreAnimationCompatibilityKey : (id)kCFBooleanTrue,
        (id)kCVPixelBufferIOSurfacePropertiesKey : @{},
    };

I think that during rendering it would help if the layer could use Metal.

NSDictionary *videoDecoderSpec = @{
         (id) kCMFormatDescriptionExtension_FullRangeVideo : FORMAT_DESC_FullRangeVideo,
         (id) kCVImageBufferChromaLocationBottomFieldKey: kCVImageBufferChromaLocation_Left,
         (id) kCVImageBufferChromaLocationTopFieldKey: kCVImageBufferChromaLocation_Left,
         (id) kCVImageBufferPixelAspectRatioKey: FORMAT_DESC_AspectRatio,
         (id) kCVImageBufferColorPrimariesKey: FORMAT_DESC_ColorPrimaries,
         (id) kCVImageBufferTransferFunctionKey: FORMAT_DESC_TransferFunction,
         (id) kCVImageBufferYCbCrMatrixKey: FORMAT_DESC_YCbCrMatrix
};

Sorry, I misspelled it. It's actually 25~50ms delay! I will try your changes anyways when I get home; travelling right now so that's only next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apple TV (2022 Version) HDR Black Screen
4 participants