Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force notifications which touch GUI elements onto main thread #504

Closed
wants to merge 9 commits into from

Conversation

EdLeming
Copy link
Contributor

This PR is an attempt fix the CRUS crash observed recently at Sussex. Debugging by Martti suggests the issue was line 978 of the TUBii model. This should be resolved by posting the note which calls this method on the main thread.

It may be that #499 was also due to the *ellieFireAction methods in the ELLIEController not posting to the main thread. These notes are picked up in the SNOPController and the resulting method calls make minor GUI changes (activate / deactivate buttons). This PR also addresses those cases.

@tlatorre-uchicago
Copy link

One thing that worries me about the recent issues with the keep alive at remote institutions is that it appears to be happening primarily at run starts. The issue is that a modal dialog will block the main thread. If this happens during the run start procedure, it's possible that the detector will not have triggers enabled when this modal pops up. If that's the case we won't be live until the shifter notices and clicks the button which could be ~30 sec to 1 minute.

Is the main goal of the keep alive signal to prevent the ellie systems from firing when Orca is not running? If that's the case, can it just automatically be restarted or is there some case in which you don't want it to be restarted?

… with red log message. Also autorestart at start of ELLIE run sequence
@EdLeming
Copy link
Contributor Author

EdLeming commented May 16, 2018

Yes, that is the main goal.

I have removed the pop-up and replaced with auto-restarts at the beginning of ELLIE flash sequences. I also added a red warning message so users are aware they can go attempt a restart from the ELLIE gui if they'd like to debug an issue.

@EdLeming
Copy link
Contributor Author

I'll test on the teststand in the morning

@@ -89,6 +89,7 @@ - (id) init {
connection = [[RedisClient alloc] init]; // Connection must be allocated before port and host name are set
portNumber = TUBII_DEFAULT_PORT;
strHostName = [[NSString alloc]initWithUTF8String:TUBII_DEFAULT_IP];
[connection setTimeout:2];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the timeout is specified in milliseconds. Would you mind adding a comment to the top of setTimeout() to explain that in this PR too?

@tlatorre-uchicago
Copy link

How did the testing go with this? Is it ready to go?

@EdLeming
Copy link
Contributor Author

Hey. I tested yesterday. It didn't solve the TUBii, orphan issue, but ran smoothly that aside. I am confident it solved the issue it was originally posted for (i.e. the run modal crashed).

@EdLeming
Copy link
Contributor Author

I should add, while testing on the detector I also stepped through commenting out sections of the code associated with the culprit commit (be17c8e). I commented out all additions in the SNOPModel, SNOPController, TUBiiModel and TUBii related changes to the ELLIEController.m before handing over to afternoon crew for ramping. Even with all of those changes removed I was still seeing TUBii orphans at run restarts.

It's not clear at all what's happening here. All I can imagine is that some process has been added which affects the ordering of calls at run restart. Changing the TUBii client timeout to 2s and extending the time between ORCA->TUBii keep alive pulses from (0.5s to 5.0s) also had no effect.

Copy link

@tlatorre-uchicago tlatorre-uchicago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few minor comments.

@@ -314,7 +314,7 @@ -(IBAction)tellieGeneralFireAction:(id)sender

[tellieGeneralFireButton setEnabled:NO];
[tellieGeneralStopButton setEnabled:YES];
[[NSNotificationCenter defaultCenter] postNotificationName:ORTELLIERunStartNotification object:nil userInfo:[self guiFireSettings]];
[[NSNotificationCenter defaultCenter] postNotificationOnMainThreadWithName:ORTELLIERunStartNotification object:nil userInfo:[self guiFireSettings]];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are IBAction methods which means they are called when someone clicks a button right? If so, they are guaranteed to be on the main thread so you don't need to change this.

@@ -328,7 +328,7 @@ -(IBAction)tellieExpertFireAction:(id)sender
}

[tellieExpertStopButton setEnabled:YES];
[[NSNotificationCenter defaultCenter] postNotificationName:ORTELLIERunStartNotification object:nil userInfo:[self guiFireSettings]];
[[NSNotificationCenter defaultCenter] postNotificationOnMainThreadWithName:ORTELLIERunStartNotification object:nil userInfo:[self guiFireSettings]];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

@@ -548,7 +548,7 @@ - (IBAction)amellieFireAction:(id)sender {

[amellieFireButton setEnabled:NO];
[amellieStopButton setEnabled:YES];
[[NSNotificationCenter defaultCenter] postNotificationName:ORAMELLIERunStartNotification object:nil userInfo:[self guiFireSettings]];
[[NSNotificationCenter defaultCenter] postNotificationOnMainThreadWithName:ORAMELLIERunStartNotification object:nil userInfo:[self guiFireSettings]];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

@@ -89,6 +89,9 @@ - (id) init {
connection = [[RedisClient alloc] init]; // Connection must be allocated before port and host name are set
portNumber = TUBII_DEFAULT_PORT;
strHostName = [[NSString alloc]initWithUTF8String:TUBII_DEFAULT_IP];
// This is being extended in an attempt to resolve the issues with ORPHANs
// from missing TUBii information at run rollovers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't resolve the orphan issue right? If so, can you delete this comment. Not sure if we want to keep the timeout extended or not. Presumably that was what was causing the run modal issue to be raised, so it could still help for remote institutions, so maybe you can just update the comment to say that the latency from remote shift stations is big enough that we are increasing it in an attempt to prevent timeouts from remote shift stations.

@@ -141,6 +144,9 @@ - (id) initWithCoder:(NSCoder *)aCoder {

//Connection must be made before port and host name are set.
connection = [[RedisClient alloc] initWithHostName:strHostName withPort:portNumber];
// This is being extended in an attempt to resolve the issues with ORPHANs
// from missing TUBii information at run rollovers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

@EdLeming
Copy link
Contributor Author

Thanks Tony. Those notifications which I've updated to be on the main thread call a void function, not an IBAction. It was separated from the IBAction in the SNOPController as they two need to load their TELLIE specific settings from different places and the logic required in a single function was quite messy. I kept the void function (which picks up the notification) in the SNOPController for a couple of reasons:

  1. So starting a TELLIE sequence from either GUI will always activate stop buttons in the control panel - making the actions a user can perform to stop a flash sequence consistent no matter which GUI it was started from. (Currently it's only the ELLIE gui which will ever get used, but if the system stabalises this will no longer be the case)

  2. So I could start a little disco ball spinning in the control panel at the start of a flash sequence. This feature actually got removed due to a clash with one of Javi's PRs and I never put it back. I will when I get time.

@tlatorre-uchicago
Copy link

I'm not sure I understand. What do you mean that they call a void function? Are the three methods tellieGeneralFireAction, tellieExpertFireAction, and amellieFireAction called from methods in separate threads?

@EdLeming
Copy link
Contributor Author

@tlatorre-uchicago
Copy link

Ok, but since these three methods (tellieGeneralFireAction, tellieExpertFireAction, and amellieFireAction) are all IBAction's and are only called when the user hits a button they are guaranteed to be on the main thread, so when you post the notification it's also guaranteed to be on the main thread, so you don't need to explicitly change the notifications to only post on the main thread.

@EdLeming
Copy link
Contributor Author

Ahh, OK. I see what you're saying. I mis-understood your initial comment. I thought you were referring to the functions they were calling, not where they were called from.

I've updated.

@tlatorre-uchicago
Copy link

merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants