Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation: timers synchronization #33

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

ball-hayden
Copy link
Contributor

Various thoughts while investigating wix/Detox#3207

Best reviewed commit-by-commit.

Thoughts and explanations are left as comments inline.

@@ -72,6 +72,9 @@ CFRunLoopTimerRef __detox_sync_CFRunLoopTimerCreate(CFAllocatorRef allocator, CF
static CFRunLoopTimerRef (*__orig_CFRunLoopTimerCreateWithHandler)(CFAllocatorRef allocator, CFAbsoluteTime fireDate, CFTimeInterval interval, CFOptionFlags flags, CFIndex order, void (^block) (CFRunLoopTimerRef timer));
CFRunLoopTimerRef __detox_sync_CFRunLoopTimerCreateWithHandler(CFAllocatorRef allocator, CFAbsoluteTime fireDate, CFTimeInterval interval, CFOptionFlags flags, CFIndex order, void (^block) (id timer))
{
NSLog(@"🤔 CFRunLoopTimerCreateWithHandler");

// What is this doing? We don't seem to be creating a trampoline here - I feel like we should be?
return (__bridge_retained CFRunLoopTimerRef)[[NSTimer alloc] initWithFireDate:CFBridgingRelease(CFDateCreate(allocator, fireDate)) interval:interval repeats:interval > 0 block:block];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we're swizzling CFRunLoopTimerCreateWithHandler without adding any tracking for these timers (as far as I can tell).

Unless initWithFireDate somehow ends up calling CFRunLoopTimerCreate?

Copy link
Contributor Author

@ball-hayden ball-hayden Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens a lot when running the Detox example app.
I don't understand what this is doing.

Copy link
Contributor

@asafkorem asafkorem Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless initWithFireDate somehow ends up calling CFRunLoopTimerCreate

Probably yes. This can be easily checked by putting a breakpoint on __detox_sync_CFRunLoopTimerCreate.
https://github.com/wix/DetoxSync/tree/master/ExampleApp

Comment on lines 99 to +102
id<DTXTimerProxy> trampoline = [DTXTimerSyncResource existingTimerProxyWithTimer:NS(timer)];
[trampoline untrack];

__orig_CFRunLoopRemoveTimer(rl, timer, mode);

[trampoline untrack];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've observed that [_DTXTimerTrampoline fire:] is called after the trampoline has been untracked (and therefore dealloced).

By switching these, I was hoping that the timer would be cancelled (ensuring that the fire method is not called after untracking).

This doesn't seem to be the case, and I'm still seeing fire called after the trampoline has been untracked, but it still feels safer to me to remove the timer before untracking?

Copy link
Contributor

@asafkorem asafkorem Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untracking the timer only after removing it from the runloop does make sense, but I'm not sure if it matters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[_DTXTimerTrampoline fire:] is called after the trampoline has been untracked

That's very weird. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the thread where this timer was created and the thread where the timer was called to invalidate are different?

Apple docs (invalidate()):

You must send this message from the thread on which the timer was installed. If you send this message from another thread, the input source associated with the timer may not be removed from its run loop, which could prevent the thread from exiting properly.

Comment on lines +127 to +135
- (void)dealloc
{
id<DTXTimerProxy> trampoline = [DTXTimerSyncResource existingTimerProxyWithTimer:self];

if(trampoline) {
NSLog(@"🤦‍♂️ dealloc, but trampoline was still active: %@", self);
[trampoline untrack];
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding this makes it safe to remove _DTXCleanTimersAndReturnCount (since we've then got a nicer way of calling untrack if the timer is dealloced)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

@@ -86,8 +86,8 @@ - (BOOL)isDead

- (void)dealloc
{
[self untrack];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand properly, the only time we should be deallocing is when the Trampoline is removed from DTXTimerSyncResource _timers?

For us to have done that, we must have called untrack already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I understand as well. I will revisit this and make sure that's indeed the case but generally I think you're right.

@@ -86,8 +86,8 @@ - (BOOL)isDead

- (void)dealloc
{
[self untrack];

NSLog(@"🤦‍♂️ trampoline dealloc: %@ (tracking: %d)", self, _tracking);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I added this to verify the assumption above. I've not seen this called with tracking YES).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -112,47 +112,24 @@ - (void)setDisplayLink:(CADisplayLink*)displayLink
#endif
}

- (void)fire:(id)timer
- (void)fire
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following changes are a bit more involved.

Firstly, there's no need for the timer parameter - we already have a reference to that as an ivar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand why the timer argument is redundant.

Comment on lines -117 to -142
//This is to ensure the timer is still valid after fire.
CFRunLoopRef runloop = CFRunLoopGetCurrent();
CFRunLoopMode mode = CFRunLoopCopyCurrentMode(runloop);
CFRunLoopPerformBlock(runloop, mode, ^{
if(CFRunLoopTimerIsValid((__bridge CFRunLoopTimerRef)timer) == NO)
{
[self untrack];

CFRelease(mode);

return;
}

CFRunLoopPerformBlock(runloop, mode, ^{
if(CFRunLoopTimerIsValid((__bridge CFRunLoopTimerRef)timer) == NO)
{
[self untrack];

CFRelease(mode);

return;
}

CFRelease(mode);
});
});
Copy link
Contributor Author

@ball-hayden ball-hayden Apr 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand right, the purpose of these are to see if the timer is still valid after it has fired.

If we ask if the timer is valid inline here, CFRunLoopTimerIsValid always evaluates to true, I assume because we're still within the fire function.

Here I'm proposing a different method.

We track the invalidation and removal of timers (in NSTimer+DTXSpy.m), which means we know when a repeating timer is stopping. We also know that a timer that doesn't repeat is finished once we have called its callback (or selector).

Therefore, we can simply call untrack if we are not a recurring timer after we have called its callback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this code checks whether the timer needs to be untracked (for example, as you mentioned - in case of a non-repeating timer) immediately after finishing the fire.

Therefore, we can simply call untrack if we are not a recurring timer after we have called its callback.

Generally, this sounds right, but I'm not sure if recurring / non-recurring is our only flow. What about calling to CFRunLoopTimerInvalidate on the 3rd fire or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling CFRunLoopTimerInvalidate will result in an explicit call to untrack from the spy:

https://github.com/PlayerData/DetoxSync/blob/f5cd8cbde60311bb8e41df7c773850cd73c8f84b/DetoxSync/DetoxSync/Spies/NSTimer%2BDTXSpy.m#L108

Comment on lines 124 to 134
if(_target && _sel) {
IMP impl = [_target methodForSelector:_sel];
void (*func)(id, SEL, NSTimer*) = (void *)impl;
func(_target, _sel, _timer);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small tidy-up recommended by https://stackoverflow.com/a/20058585

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines -195 to +183
return @{
@"fire_date": _fireDate ? [_DTXTimerTrampoline._descriptionDateFormatter stringFromDate:_fireDate] : @"none",
@"time_until_fire": @(_timeUntilFire),
@"is_recurring": @(_repeats),
@"repeat_interval": @(_ti)
};
return @{
@"fire_date": _fireDate ? [_DTXTimerTrampoline._descriptionDateFormatter stringFromDate:_fireDate] : @"none",
@"time_until_fire": @(_timeUntilFire),
@"is_recurring": @(_repeats),
@"repeat_interval": @(_ti)
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the file is indented using tabs. Fixed for consistency (and editor happiness).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legit 👍
To be honest, I hate tabs, but I prefer consistency over inconsistency.

@asafkorem
Copy link
Contributor

Thanks @ball-hayden, you definitely raised some interesting points. I need to dive deep into this to have a significant input on that subject. I'll start review this later today 🙂

Comment on lines -108 to +114
objc_setAssociatedObject(_displayLink, __DTXTimerTrampolineKey, self, OBJC_ASSOCIATION_RETAIN_NONATOMIC);
objc_setAssociatedObject(_displayLink, __DTXTimerTrampolineKey, self, OBJC_ASSOCIATION_RETAIN);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NONATOMIC could cause races when looking up the associated trampoline.

__weak NSTimer* _timer;
NSTimer* _timer;
Copy link
Contributor Author

@ball-hayden ball-hayden Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that we still have a reference to the _timer object until we are done.

This might be a bad idea - I'm a bit unsure as to whether this leaves us with a circular dependency between us and the timer for releasing?

@@ -112,47 +112,24 @@ - (void)setDisplayLink:(CADisplayLink*)displayLink
#endif
}

- (void)fire:(id)timer
- (void)fire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand why the timer argument is redundant.

Comment on lines -117 to -142
//This is to ensure the timer is still valid after fire.
CFRunLoopRef runloop = CFRunLoopGetCurrent();
CFRunLoopMode mode = CFRunLoopCopyCurrentMode(runloop);
CFRunLoopPerformBlock(runloop, mode, ^{
if(CFRunLoopTimerIsValid((__bridge CFRunLoopTimerRef)timer) == NO)
{
[self untrack];

CFRelease(mode);

return;
}

CFRunLoopPerformBlock(runloop, mode, ^{
if(CFRunLoopTimerIsValid((__bridge CFRunLoopTimerRef)timer) == NO)
{
[self untrack];

CFRelease(mode);

return;
}

CFRelease(mode);
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this code checks whether the timer needs to be untracked (for example, as you mentioned - in case of a non-repeating timer) immediately after finishing the fire.

Therefore, we can simply call untrack if we are not a recurring timer after we have called its callback.

Generally, this sounds right, but I'm not sure if recurring / non-recurring is our only flow. What about calling to CFRunLoopTimerInvalidate on the 3rd fire or something?

Comment on lines +130 to 132
if(!_repeats) {
[self untrack];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this will have the same result as the previous check.

Comment on lines -152 to -155
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Warc-performSelector-leaks"
[_target performSelector:_sel withObject:timer];
#pragma clang diagnostic pop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you removed this? I guess it was there for a reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is replaced by ln 131-138 - gaining a reference to impl explicitly is recommended.

I think my previous comment was lost - sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't, you're right #33 (comment), forgot about this for a moment 😅

@asafkorem
Copy link
Contributor

asafkorem commented Apr 20, 2022

@ball-hayden That's very impressive. You brought up a lot of questions here, I left a few comments with thoughts back. I'll continue the review next week.

Note that I'm not very proficient in this code territory, since I wasn't involved at any point in writing it, as you can see there isn't much documentation or logical separation into commits.. this is definitely a weak spot on this massive (and impressive) project (DetoxSync) 😅

Anyway I'll try my best to help in this investigation, another note that it might take me some time to answer as we are both under-capacity and the fact that it takes time to get into things 🙂

@ball-hayden
Copy link
Contributor Author

Thanks @asafkorem.

For the avoidance of doubt, I'm not expecting to merge this PR - it's entirely for discussion and ideas.

As I mentioned in wix/Detox#3207 (comment) I'm afraid I'm also having to leave this alone in work time - there isn't the business appetite to spend more time on it at the moment.

I'll continue to try and keep some thoughts out of hours though - it's an interesting problem and I'd love to know what the root cause actually is.

@asafkorem asafkorem changed the title Timer simplify Investigation: timers synchronization May 2, 2022
[self _untrackForParam:&_viewNeedsLayoutCount eventIdentifier:_DTXStringReturningBlock(identifier)];
[self _untrackForParam:&self->_viewNeedsLayoutCount eventIdentifier:_DTXStringReturningBlock(identifier)];
Copy link
Contributor

@asafkorem asafkorem May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

being explicit in such cases is indeed better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but except for being more explicit, is there any other goal in this change? What exactly is the fix here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asafkorem
Copy link
Contributor

@ball-hayden thanks again for this investigation and the refactoring suggestions! 🚀

I believe some of the fixes here can be extracted into smaller PRs.
If you'd like to do so, I would love to start merging some of the changes, like this one: 716e2a7 🙂

@asafkorem
Copy link
Contributor

Have you tried running all the changes with detox, specifically with our tests suite?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants