Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better trick to get access to other-Space windows #447

Open
lwouis opened this issue Jul 21, 2020 · 91 comments
Open

Better trick to get access to other-Space windows #447

lwouis opened this issue Jul 21, 2020 · 91 comments
Labels
enhancement New feature or request need breakthrough Need a breakthrough idea to move forwards

Comments

@lwouis
Copy link
Owner

lwouis commented Jul 21, 2020

Is your feature suggestion related to a problem? Please describe.
When AltTab starts, there is a flash-of-content as windows from other Spaces are temporarily brought in the current space through a private API. This is needed to be able to focus them later. However, it is janky as it confuses the user with the flashing, and is also limited in power as it has a 1s budget to try and grab the windows, after which windows which were not grabbed will not be known to AltTab.

Describe the solution you'd like
HyperSwitch is able to focus windows from other Spaces after starting. It does not flash content doing so, so they must have a better way.

@lwouis lwouis added the enhancement New feature or request label Jul 21, 2020
@lwouis
Copy link
Owner Author

lwouis commented Jul 21, 2020

This investigation has already been discussed, and there is lots of interesting information in #431

@koekeishiya
Copy link

koekeishiya commented Jul 21, 2020

I downloaded HyperSwitch to test, and it appears to me that they do not actually have a real fix for this issue. They actually appear to somehow freeze a majority of screen updates and other actions while HyperSwitch is launching. I assume that during this time they actually use the same trick that you have implemented, but they try hard to avoid the visual flicker. You can see artifacts or weird behaviour if you try to switch a space or open mission-control during this split second during its launch.

Edit: Actually, they place invisible windows on each space that they use to first make that space focus, and then they re-focus the target window. I managed to spot the active application in the menubar saying "HyperSwitch" during the space transition, and immediately after the transition ends, it focuses the window I selected. Then if I switch back, and switch to the same window again, it does not show "HyperSwitch" during the transition, but the name of the actual application of the window I selected, as they now have the AXUIElementRef to work with.

@lwouis lwouis added the need breakthrough Need a breakthrough idea to move forwards label Jul 22, 2020
@lwouis
Copy link
Owner Author

lwouis commented Jul 22, 2020

Damn @koekeishiya you're clever! I suspected the invisible windows trick, but I ran the accessibility API on HyperSwitch itself, and found it had 0 window. I forgot that this api doesn't list windows from other Spaces.

I'll try to implement this trick. It's way smoother that flashing the screen 👍

@lwouis
Copy link
Owner Author

lwouis commented Aug 26, 2020

Thinking about it, the trick of invisible windows on all Spaces is not good enough. We need to have AXUIElement for all windows. This gives us the correct titles, role, subrole, etc. We use these as heuristics to decide if the windows should be shown or not (see #456).

By accepting that we sometimes don't have an AXUIElement for a window, we need to implement a whole mirror logic using the CG APIs: get the title, filter windows based on other criterias specific to the CG APIs, etc.

There must be a better way. I wonder if it's possible to activate a window on another Space without switching to that Space. Maybe in that scenario, we get access to that Space's windows.

@metacodes
Copy link
Contributor

It seems that we can not get AXUIElement for a window in other space by using private API CGSAddWindowsToSpaces from macOS 12.3.1. We can not get AXUIElement for a fullscreen window by using private API CGSAddWindowsToSpaces from macOS 12.3.

@lwouis
Copy link
Owner Author

lwouis commented Apr 18, 2022

@metacodes it seems you found the root cause for #1324

@metacodes
Copy link
Contributor

@metacodes it seems you found the root cause for #1324

I found this problem yesterday while testing the code(#1484 ), and it tested perfectly fine on 12.1. Regarding this private api, I read some of the posts you discussed before, as well as did some research and partially solved the problem so far. There are still some issues to be solved, and I hope to find a perfect way to solve this problem.

@jkelleyrtp
Copy link

jkelleyrtp commented Apr 27, 2022

Thinking about it, the trick of invisible windows on all Spaces is not good enough. We need to have AXUIElement for all windows. This gives us the correct titles, role, subrole, etc. We use these as heuristics to decide if the windows should be shown or not (see #456).

By accepting that we sometimes don't have an AXUIElement for a window, we need to implement a whole mirror logic using the CG APIs: get the title, filter windows based on other criterias specific to the CG APIs, etc.

There must be a better way. I wonder if it's possible to activate a window on another Space without switching to that Space. Maybe in that scenario, we get access to that Space's windows.

I figured this one out. They cache which windows are on the space after you leave the space and monitor for closing events given the cached AXUIElementRefs.

You can also get a list of windows and spaces from a combination of CGS apis, cross-referencing, and the com.apple.spaces plist. There's also the screenshot API which alt-tab could take advantage of.

When you switch to a particular window in their hyperswitch list, it actually switches to its own window on that screen and then gives up focus to the window the user target. Fortunately, this doesn't use any private APIs.

Here's my proof-of-concept. I wrote it in Rust w/ Tao as the window manager.

Note that since my use case is just cmd+tab into alternative spaces, I don't both focusing to a particular window. However, I do keep the list of windows around.

Now if we could only figure out how to get rid of the space switching animation....

http://github.com/jkelleyrtp/kauresel

Screen.Recording.2022-04-27.at.4.08.19.PM.mov

@koekeishiya
Copy link

Now if we could only figure out how to get rid of the space switching animation....

This is impossible if you don't inject code into Dock.app: koekeishiya/yabai#1235 (comment) You literally need to patch the function inside Dock.app that is responsible for doing this animation. You can use private APIs (https://github.com/koekeishiya/yabai/blob/master/src/osax/payload.m#L59), but macOS will be out of sync because there are internal data structures inside the Dock's process-space that keep track of (and render/display) this state. If you use those APIs you need to kill the Dock process to trigger a reload of the system state.

@lwouis
Copy link
Owner Author

lwouis commented Apr 28, 2022

@jkelleyrtp i didn't understand your suggestion fully.

Here is the challenge:

  • AltTab starts
  • There is already a window on another Space
  • User presses alt+tab
  • AltTab has to show the window on the other Space even though the user has never been to that Space. We can't use other APIs than AX because our window discrimination logic needs the AX ref to decide if the window is to be shown or not

How does your POC solve that?

@jkelleyrtp
Copy link

jkelleyrtp commented Apr 28, 2022

@jkelleyrtp i didn't understand your suggestion fully.

Here is the challenge:

  • AltTab starts
  • There is already a window on another Space
  • User presses alt+tab
  • AltTab has to show the window on the other Space even though the user has never been to that Space. We can't use other APIs than AX because our window discrimination logic needs the AX ref to decide if the window is to be shown or not

How does your POC solve that?

From what I understood about the problem

  • We want to cmd-tab to a window that's offscreen
  • The only way to bring a window into focus or to raise it is through AX ref

At first glance you can't get an AXRef without seeing the window first and then caching it. However, if the AXRef is used only to focus the window from a foreign space, then we can circumvent the limitation by placing our own windows in every space.

This works if your discrimination logic can get by with the information gleaned from:

  • com.apple.spaces
  • CGWindowListCopyWindowInfo( w/ kCGWindowListOptionAll )
  • CGSCopySpacesForWindows

GCWindowListCopyWindowInfo produces a bunch of helpful stuff

  • Key: kCGWindowLayer
  • Key: kCGWindowAlpha
  • Key: kCGWindowMemoryUsage
  • Key: kCGWindowIsOnscreen
  • Key: kCGWindowSharingState
  • Key: kCGWindowOwnerPID
  • Key: kCGWindowNumber
  • Key: kCGWindowOwnerName
  • Key: kCGWindowStoreType
  • Key: kCGWindowBounds
  • Key: kCGWindowName

If you need the AXRef to perform discrimination logic then tossing your own windows into each space doesn't work. But if you can get by without it (either by filtering titles, spaces, or or layers), then you can use your own AXRefs from hidden windows to switch spaces. Once you're at the new space, you can bring the window forward immediately now that its AXRef is availabile to you.

My PoC creates hidden windows (well not in the gif but it does) and then sends them to all of the unique spaces gathered from com.apple.spaces and CGSCopySpacesForWindows. For the app I'm trying to build, this is enough, since all I want is to switch between virtual desktops with cmd+tab with a grid-based layout.

@metacodes
Copy link
Contributor

After some decompiling, I found out that the Contexts.app switch space by using an invisible window. I think we don't actually need to get the AXUIElement of the other space when AltTab is launched, we can delay to get the AXUIElement when we switch to that space. We can move the helper window to the space we want by using CGSMoveWindowsToManagedSpace, and then switch to that space. After that, we can get AXUIElement and focus that window we want. The implementation code is on the PR #1484 .

@metacodes
Copy link
Contributor

metacodes commented Apr 28, 2022

The following codes are from Contexts.app. It uses these codes to switch space.

-(void)makeFrontProcess {
    SetFrontProcessWithOptions(&self->_processSerialNumber, 0x1);
    return;
}

/* @class CWSActivateWindowOperation */
-(void)changeToSpace:(void *)arg2 {
    var_60 = [arg2 retain];
    rax = objc_alloc_init(@class(CWSActivateWindowOperationHelperWindow));
    r15 = *_objc_msgSend;
    [self setHelperWindow:rax];
    [rax release];
    rax = [self helperWindow];
    rax = [rax retain];
    [rax makeKeyAndOrderFront:0x0]; //the helperWindow is NSWindow
    [rax release];
    var_58 = self;
    rax = [self helperWindow];
    rax = [rax retain];
    rdx = [rax windowNumber]; //NSWindow.windowNumber
    r13 = [[CTCoreGraphics spacesForWindow:rdx withSpaceMask:0x7] retain];
    [rax release];
    rax = [self helperWindow];
    rax = [rax retain];
    r15 = rax;
    r12 = [rax windowNumber];
    var_38 = var_60;
    rax = [NSArray arrayWithObjects:rdx count:0x1];
    rax = [rax retain];
    rdx = r12;
    var_50 = r13;
    r12 = *_objc_msgSend;
    [CTCoreGraphics moveWindow:rdx fromSpaces:r13 toSpaces:rax];
    [rax release];
    [r15 release];
    rdx = var_60;
    rbx = [[CTCoreGraphics screenForSpace:rdx] retain];//return screen id
    r14 = r12;
    r12 = [[var_58 helperWindow] retain];
    var_68 = rbx;
    if (rbx != 0x0) { //not nil
            rdx = @selector(frame);//get frame origin
            objc_msgSend_stret(&var_90, rbx, rdx);
            intrinsic_movsd(xmm0, var_90);
            intrinsic_movsd(xmm1, *(&var_90 + 0x8));
    }
    else {
            intrinsic_movaps(var_80, 0x0);
            intrinsic_movaps(var_90, 0x0);
    }
    var_30 = **___stack_chk_guard;
    (r14)(r12, @selector(setFrameOrigin:), rdx);//set helperWindow's frame
    [r12 release];
    // set self.helperWindow's frame,and NSWindow.makeKeyAndOrderFront
    (r14)([(r14)(var_58, @selector(helperWindow), rdx) retain], @selector(makeKeyAndOrderFront:), 0x0);
    [rax release];
     // NSApp.activateIgnoringOtherApps
    (r14)(**_NSApp, @selector(activateIgnoringOtherApps:), 0x1);
    // print some logs
    var_48 = @"toSpace";
    var_40 = var_60;
    rax = (r14)(@class(NSDictionary), @selector(dictionaryWithObjects:forKeys:count:), &var_40, &var_48, 0x1);
    (r14)(var_58, @selector(logInfo:data:), @"Changing space complete.", [rax retain]);
    [rax release];
    [var_68 release];
    [var_50 release];
    [var_60 release];
    if (**___stack_chk_guard != var_30) {
            __stack_chk_fail();
    }
    return;
}

//helperWindow init
/* @class CWSActivateWindowOperationHelperWindow */
-(void *)init {
    var_40 = self;
    *(&var_40 + 0x8) = *0x10053b878;
    rax = [[&var_40 super] init];
    rbx = rax;
    if (rax != 0x0) {
            intrinsic_movaps(var_30, 0x0);
            [rbx setFrame:0x1 display:intrinsic_movaps(var_20, intrinsic_movaps(0x0, *(int128_t *)0x100420d90))];
            rsp = (rsp - 0x20) + 0x20;
            [rbx setStyleMask:0x0];
            [rbx setIgnoresMouseEvents:0x1];
            [rbx setHidesOnDeactivate:0x1];
            [rbx setTitle:@"Contexts H"];
            [rbx retain];
    }
    [rbx release];
    rax = rbx;
    return rax;
}

/* @class CTCoreGraphics */
+(void *)spacesForWindow:(unsigned int)arg2 withSpaceMask:(int)arg3 {
    r14 = arg3;
    rbx = arg2;
    if ([self privateApiAvailable] != 0x0) {
            r15 = (*qword_10054e740)(); // CGSMainConnectionID()
            rax = [NSNumber numberWithUnsignedInt:rbx];
            rax = [rax retain];
            var_38 = rax;
	    // CGSCopySpacesForWindows()
            rbx = qword_10054e788(r15, r14, [NSArray arrayWithObjects:rbx count:0x1]);
            [rax release];
    }
    else {
            rbx = [[NSArray array] retain];
    }
    if (**___stack_chk_guard == **___stack_chk_guard) {
            rax = [rbx autorelease];
    }
    else {
            rax = __stack_chk_fail();
    }
    return rax;
}

/* @class CTCoreGraphics */
+(void)moveWindow:(unsigned int)arg2 fromSpaces:(void *)arg3 toSpaces:(void *)arg4 {
    r12 = arg2;//helperWindow.windowNumber
    var_48 = [arg3 retain];
    r15 = [arg4 retain];
    if ([self privateApiAvailable] != 0x0) {
            r14 = (*qword_10054e740)(); // CGSMainConnectionID()
            var_38 = [[NSNumber numberWithUnsignedInt:r12] retain];
	   // CGSRemoveWindowsFromSpaces(), remove helper window from that space
            qword_10054e798(r14, [NSArray arrayWithObjects:r12 count:0x1], var_48);
            [rax release];
            r13 = (*qword_10054e740)(); // CGSMainConnectionID()
            var_40 = [[NSNumber numberWithUnsignedInt:r12] retain];
	    // CGSAddWindowsToSpaces(), add helper window to that space
            qword_10054e790(r13, [NSArray arrayWithObjects:r12 count:0x1], r15);
            [rax release];
    }
    var_30 = **___stack_chk_guard;
    [r15 release];
    [var_48 release];
    if (**___stack_chk_guard != var_30) {
            __stack_chk_fail();
    }
    return;
}

@metacodes
Copy link
Contributor

HyperSwitch.app uses these codes to switch space. I haven't fully read the codes yet because they encoded the private API and can't directly read what api they used to do these things, but I've decoded some private APIs they used and need to take some time to do further investigate.

/* @class OCWindow */
-(void)bringToFront:(char)arg2 {
    rbx = arg2;
    r15 = self;
    rax = [self ownerPid];
    if (rax == 0x0) goto .l1;

loc_10003d3f2:
    rax = GetProcessForPID(rax, &var_40);
    if (rax != 0x0) goto .l1;

loc_10003d405:
    if ([[r15 ownerName] isEqualToString:@"X11"] == 0x0) goto loc_10003d46f;

loc_10003d431:
    if (*(int32_t *)dword_10017eecc >= 0x2) {
            NSLog(@"We can't raise X11 windows, bringing XQuartz to front instead");
    }
    [[r15 ownerApplication] activateWithOptions:0x3];
    return;

.l1:
    return;

loc_10003d46f:
    rax = [r15 axWindow];
    r13 = rax;
    if (rax != 0x0) {
            AXUIElementPerformAction(r13, @"AXRaise");
    }
    var_30 = rbx;
    if (rbx != 0x0) {
            var_2C = 0x1;
            if ([r15 isVisible] == 0x0) {
                    r14 = [OCWindow currentSpaceID];
                    rax = [r15 space];
                    if ((rax != 0x0) && (rax != r14)) {
                            sub_10003bef2(rax, 0x1);
                            if (r13 == 0x0) {
                                    usleep(0x493e0);
                            }
                            var_2C = 0x0;
                    }
            }
    }
    else {
            var_2C = 0x1;
    }
    if (r13 != 0x0) goto loc_10003d54d;

loc_10003d51e:
    rbx = 0xa;
    goto loc_10003d523;

loc_10003d523:
    usleep(0x186a0);
    r13 = [r15 axWindow];
    rbx = rbx - 0x1;
    if (rbx == 0x0) goto loc_10003d544;

loc_10003d53d:
    if (r13 == 0x0) goto loc_10003d523;

loc_10003d54d:
    var_38 = r15;
    xmm0 = intrinsic_movss(xmm0, *(int32_t *)float_value_1);
    AXUIElementSetMessagingTimeout(r13, xmm0);
    r15 = 0x4;
    goto loc_10003d575;

loc_10003d575:
    rax = AXUIElementPerformAction(r13, @"AXRaise");
    if (rax == 0x0) goto loc_10003d5e6;

loc_10003d584:
    r14 = rax;
    if (*(int32_t *)dword_10017eecc > 0x0) {
            NSLog(@"Couldn't raise (errno: %d), trying again ...", r14);
    }
    xmm0 = intrinsic_movss(xmm0, *(int32_t *)float_value_3);
    AXUIElementSetMessagingTimeout(r13, xmm0);
    r15 = r15 - 0x1;
    if (r15 != 0x0) goto loc_10003d575;

loc_10003d5b2:
    AXUIElementSetMessagingTimeout(r13, 0x0);
    if (r14 == 0xffff9d8c) {
            r15 = var_38;
            if (*(int32_t *)dword_10017eecc > 0x0) {
                    NSLog(@"AXErrorCannotComplete");
            }
    }
    else {
            rax = SetFrontProcessWithOptions(&var_40, 0x1);
            r15 = var_38;
    }
    goto loc_10003d60a;

loc_10003d60a:
    rdx = @"X11";
    rcx = var_2C | (var_30 == 0x0 ? 0x1 : 0x0);
    if (rcx == 0x0) {
            r14 = dispatch_get_global_queue(0xfffffffffffffffe, 0x0);
            r12 = r15;
            r15 = *__NSConcreteStackBlock;
            var_90 = r15;
            *(&var_90 + 0x8) = 0xffffffffc0000000;
            *(&var_90 + 0x10) = sub_10003d713;
            *(&var_90 + 0x18) = 0x100141058;
            *(&var_90 + 0x20) = var_40;
            dispatch_after(dispatch_time(0x0, 0x11e1a300), r14, &var_90);
            var_68 = r15;
            r15 = r12;
            *(&var_68 + 0x8) = 0xffffffffc2000000;
            *(&var_68 + 0x10) = sub_10003d726;
            *(&var_68 + 0x18) = 0x100140e80;
            *(&var_68 + 0x20) = r12;
            rax = dispatch_time(0x0, 0x1dcd6500);
            rdx = &var_68;
            dispatch_after(rax, r14, rdx);
    }
    [[NSNotificationCenter defaultCenter] postNotificationName:@"OCWindowBroughtToFrontNotification" object:r15];
    return;

loc_10003d5e6:
    AXUIElementSetMessagingTimeout(r13, 0x0);
    rax = SetFrontProcessWithOptions(&var_40, 0x1);
    r15 = var_38;
    goto loc_10003d60a;

loc_10003d544:
    if (r13 == 0x0) goto loc_10003d60a;
}

@lwouis
Copy link
Owner Author

lwouis commented Apr 28, 2022

@jkelleyrtp @metacodes first of all, thank you for digging into these advanced tricks and trying to find a breakthrough. I also decompiled the other apps to try to understand how they do it. I never got any secret trick though to be honest. My reverse-engineering skills are pretty low.

Now, I'd like to quote myself again, and please read carefully what I'm talking about:

Thinking about it, the trick of invisible windows on all Spaces is not good enough. We need to have AXUIElement for all windows. This gives us the correct titles, role, subrole, etc. We use these as heuristics to decide if the windows should be shown or not (see #456).

By accepting that we sometimes don't have an AXUIElement for a window, we need to implement a whole mirror logic using the CG APIs: get the title, filter windows based on other criterias specific to the CG APIs, etc.

There must be a better way. I wonder if it's possible to activate a window on another Space without switching to that Space. Maybe in that scenario, we get access to that Space's windows.

Please follow the link and see how complex the heuristic to decide if a window is real of not is. Please look at the current implementation.

In addition to detecting real/fake windows, as I said in my original quote, there is the issue of the window metadata like its title. If we use another API than AX to get windows title, then the title will suddenly change once the user visits the Space with that window. Essentially we mislead them until we get the AXref, from which point we have reliable data to show.

Oh and also after checking out the pull-request, I also realize we don't know how to deal with other AX actions: closing a window, minimizing/de-minimizing, fullscreening. If we don't have the AXref, we can't do it, even with the invisible window trick.

So in short: yes, invisible windows are a workaround focusing windows without their AXref, but it we need another workaround for window titles and for window detection still. It's not dealing with the whole problem, just the focus part. It's not good enough.

@metacodes how do you think apple shortcuts gets windows data? Maybe you could decompile and look?

@metacodes
Copy link
Contributor

@metacodes how do you think apple shortcuts gets windows data? Maybe you could decompile and look?

Wow, that looks very cool! I've got something new to work on. Actually, my reverse-engineering skills are also pretty low, hopper is still something I bought to solve AltTab problems, so don't expect too much from me.😄 But this reverse-engineering is very interesting, so I can use it to pass the boring time during the epidemic(COVID-19).

@metacodes
Copy link
Contributor

metacodes commented Apr 28, 2022

@metacodes how do you think apple shortcuts gets windows data? Maybe you could decompile and look?

@lwouis I have tried apple shortcuts just now on macOS 12.3.1. I found that it can't show windows on other space, just show those windows in current space. Bad news.

@lwouis
Copy link
Owner Author

lwouis commented Apr 28, 2022

Here's my attempt at summarizing the situation, regarding the goal of this ticket:

Why we need to use the AX API

It's tempting to think of solutions involving alternative APIs (e.g. CG APIs, AppleScript, system .plist files, CLI binaries like, Automator.app, Shortcuts.app, etc). Here are the things that we need to do, and that these APIs can't deliver like the AX APIs:

  • Having the correct name for a window
  • Being able to filter out windows that shouldn't be shown (e.g. HUDs, small popups, dropdown menus, etc)
  • Being able to focus a window
  • Being able to close a window
  • Being able to minimize/de-minimize a window
  • All the above, but on windows that are on another Space, minimized, from a hidden app, on another display, etc

How to get the AX references

There are only 2 ways that I know to get the AX reference of a window on another Space:

  • Bring the window to the current Space, using a private API
  • Switch to that Space

The first method is what AltTab does currently. It creates a flash of content at launch (see OP). It also has the issue of being broken from macOS 12.3 onwards after Apple broke the CGSAddWindowsToSpaces API. However it seems that we could simply replace it with CGSMoveWindowsToManagedSpace.

The second method has the problem that when switching to a Space, there is a long animation that can't be avoided.


Here's the situation. Now it's up to us to find a breakthrough workaround.

@jkelleyrtp
Copy link

jkelleyrtp commented Apr 28, 2022

I haven't dove too deep into AX vs CG but the CGSCopySpacesForWindows provides a lot of information. For offscreen windows I don't think you'd run into issues like popups? I imagine you could populate the cache with CGSCopySpacesForWindows first and then update it with more accurate AX information as you visit those spaces. For me, CGSCopySpacesForWindows solves (approximately, can be updated later with AXrefs) these two issues:

  • Having the correct name for a window
  • Being able to filter out windows that shouldn't be shown (e.g. HUDs, small popups, dropdown menus, etc)

These two can be solved by bringing the window from a foreign space to the current one and then performing the action after getting the AX ref (or using a cached ref it exists):

  • Being able to close a window
  • Being able to minimize/de-minimize a window

This one can be solved with an invisible window (or with cached ax ref)

  • Being able to focus a window

I think using a rough heuristic and then populating it with updated information would at least solve the issue for me where alt-tab doesn't show any of my apps when I launch it, and I have to dance between desktops. It also seems like some of my apps never make it into the carousel, hence why I've been digging into the alt-tab source.

@metacodes
Copy link
Contributor

@jkelleyrtp Maybe you should go to see the code review comments in PR #1484 . We can close/minimize/de-minimize/focus a window after we switch to that space, but not ideal.

@metacodes
Copy link
Contributor

metacodes commented Apr 29, 2022

@lwouis I have an idea, is it possible for us to develop a Daemon like WindowServer process that is started before the user logs in. It could listen to the AXRef of all the programs after it is started, similar to an AXRef state machine. This means that it can have the AXRefs of all programs after the user logs in. Thus, when AltTab encounters a program that needs to be operated that does not have an AXRef, it sends a request to the Daemon to do the operation for it. We can't put all the logic into the Daemon, on the one hand I don't know if there is any API limitation for this Daemon, on the other hand if we need to update the Daemon frequently, it will require the user to restart the computer, otherwise there is no way to manage all the AXRefs. This Daemon just keeps some AX references and simple window operations, like closing, minimizing, maximizing, etc. Also I've only looked a little at Daemon as a technology and am not sure if it can be started before the user logs in, and if there are any API limitations to this daemon that it can't get AXRef. https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/Introduction.html

@lwouis
Copy link
Owner Author

lwouis commented Apr 29, 2022

That is a very clever idea @metacodes!

I think it can work actually. I think technically there are no blockers for your idea. Yesterday i removed some values from
AltTab LaunchAgent (that launches AltTab at login) and in a fun coincidence, i noticed that the documentation said that:

RunAtLoad: Running the job when it is loaded

This key is used to start the job as soon as it has been loaded. For daemons this means execution at boot time, for agents execution at login.

We could indeed use a global daemon at computer boot instead of a launchagent (here an app) at login, we we are in competition/race with the other login apps.

Having a "backend" to track OS windows state, and the app being a UI to see and order it, this has been discussed a bit in #371 already. There is even a PR: #768.

While it is a very interesting idea, there are difficulties to deal with if we go with this new architecture:

  • Maybe some APIs limitation as you've said. To be confirmed
  • How to update the app? I think Sparkle that we use to do the nice auto-updates supports multi-parts apps like this, but it will be a lot of plumbing to achieve it i think
  • How about permission to add a new daemon to launch at boot? The current LaunchAgent stores a file in the home folder path, so it's fine, but daemons put it in the root folder path apparently. Maybe with SIP and other macOS system integrity protections these days, we need a special permission from the user? Or maybe it's not even possible anymore? To be confirmed
  • We introduce XPC/IPC between the 2 processes. We have to make sure the tech is solid and the performance is good. We don't want to slow down AltTab with inter-process communication
  • How to organize the source code? 1 monorepo with both or a new repo for the backend? How about CI/CD? Etc
  • Probably a lot more things i'm not seeing right now

It's a great idea in theory but i can tell it will be a lot of work to make a POC and then the real full solution. That being said it could elevate AltTab from a simple app to a backend many projects could built on top of. There are many projects already who attempt to tame windows state management and as far as i know AltTab is the only reliable one. Except for yabai, but yabai injects the Dock and does very intrusive things that require the user to disable SIP which is a lot to ask of users, thus making it a niche solution for really motivated power users

That being said, using a backend would only solve the login/boot situation. A user who restarts AltTab during their session would still need a trick to see windows from other Spaces. We may have a popup to tell them that they need to reboot? It's not ideal. We would probably keep our current trick here to be able to show windows still. Mmm not sure what to think

@metacodes
Copy link
Contributor

A user who restarts AltTab during their session would still need a trick to see windows from other Spaces. We may have a popup to tell them that they need to reboot?

This only happens when updating Daemon, so it's only possible to keep Daemon simple and keep its updates to a minimum. Once it needs to be updated, we do need to have a popup or something to let the user know about it. Just considering how often Daemon is updated, this might be acceptable. But from those technical difficulties you mentioned above, it's really hard for us to just switch to that model at the moment.

@lwouis
Copy link
Owner Author

lwouis commented Apr 29, 2022

It's a good point. I think Sparkle can deliver updates with that level of detail like "only update the main app" or "update the main app + the daemon". That could improve the UX for sure. We would need a new CI script to tag the delivery manifest based on which files were updated.

Yeah it's big groundworks but it may be the only solution long-term, i don't know

@lwouis
Copy link
Owner Author

lwouis commented Jun 19, 2022

Today I did more testing on how fullscreen windows actually work. We support closing, minimizing, de-fullscreening them, from another Space. Stuff that macOS won't let you do otherwise. I realized that it creates weird artifacts:

Scenario Behavior
Close a fullscreen window from another Space The window quickly flashes on the current Space, then is closed
Minimize a fullscreen window from another Space The window quickly flashes on the current Space, then it actually works, surprisingly, even though macOS disables the yellow "minimize" button if you go on that Space to minimize with the mouse
De-fullscreen a fullscreen window from another Space The window quickly flashes on the current Space, then is nowhere to be seen. Its Space is destroyed. You can still get the window back by right-clicking on its app's Dock icon, then selecting that window. It's still open, just not accessible on any Space directly. That behavior is pretty bad UX
Hide (an app with) a fullscreen window from another Space Nothing happens for that window. Non-fullscreen windows of that app are hidden. This one is weird even with native macOS UI interactions. Fullscreen windows don't get hidden when you hide an app.

A note on close and minimize: we first set kAXFullscreenAttribute to false, and only then we send the close/minimize event. For the minimize event, we wait 1s (hardcoded duration :/), because otherwise macOS ignores the command to minimize, as it's still doing the de-fullscreen animation.

Conclusion: I think that dealing with fullscreen windows in general is a broken experience on macOS. Same with AltTab. Maybe we could just give up on fullscreen window, and always bring the user to their Space before doing any action on them. That way we always get the AXref before acting. Then for non-fullscreen windows, we can use SLSMoveWindowsToManagedSpace to bring them to the current Space before sending a command, to get the AXref. Alternatively, we could do bring them to the current Space in advance, when they are spawned, so that later we don't need to.

The advantage of bringing the windows early is that then we have the AXref to show title, remove non-windows, and do commands from the current Space. The downside is that it flashes those windows for the user (maybe there is a way to hide them temporarily?). And vice-versa for the other approach.

@lwouis
Copy link
Owner Author

lwouis commented Jun 20, 2022

I found this function in SkyLight: _SLSPackagesAssignDraggedWindowToDestinationSpace(int arg0, int arg1, int arg2, int arg3, int arg4, int arg5). It seems to be still available on Catalina.

image

@koekeishiya @jkelleyrtp @metacodes Do you know the complete signature of that API?

@ifsheldon
Copy link

A slight deviation: I see Apple released a new kit this year in WWDC, which might be helpful to detect windows. The kit is ScreenCaptureKit. This is kind of a misuse, but one of its APIs, SCShareableContent, seems perfect for grabbing all information of all windows of all displays.

(I guess) All we need is:

  1. the permission of screen capture, because the kit requires it but we don't actually need to capture the screen
  2. use this API every time we need the window information

I'm not sure if this works because I didn't try it and I'm not a Swift developer, so this is just a suggestion. I will try this API and write a minimal demo when I'm not busy. If anyone wants to try out, go ahead.

@lwouis
Copy link
Owner Author

lwouis commented Jun 24, 2022

@ifsheldon I'm afraid, ScreenCaptureKit is barely wrapping the existing CG/CGS APIs. The data it returns for windows is quite limited: https://developer.apple.com/documentation/screencapturekit/scwindow

I think it could perhaps be used for #122, where I also suggested it. But it would not solve the issues discussed in this ticket here, as this new API doesn't provide us with the Accessibility window reference we need to focus/miniaturize/close windows. I also expect it would return the same windows as CGWindowListCopyWindowInfo . Notice how similar the parameters are to SCShareableContent.getExcludingDesktopWindows.

@brettstover
Copy link

Perused this thread and have some thoughts, some of which might be worthwhile. I have no experience with the accessibility APIs discussed here though, so keep that in mind.

1:
If I follow the thread correctly, it's assumed that AXUIElement can be passed from process to process (e.g., from a daemon to the main app). I'm not sure if this is true, so if it is true, assume for the below points we're passing AXUIElement (or some intermediate type that AXUIElement can be reconstructed from), and if not true, then assume we're passing a dictionary of window related values.

2:
I think a login item / helper app could work well here at least as a partial solution. While it doesn't launch before login as you wanted with a daemon, it can be launched on login, run in the background windowless with no menu bar or dock item and can survive the termination and relaunch of the main app. Therefore even without launching before login, it's better positioned to have more of the data you need than the main app would be. It also inherits the permissions granted to the main app which can be helpful. Also, in another thread, it was mentioned that the AX calls are blocking and that this can be a problem. If the AX calls are made in a separate process, then this likely isn't an issue.

Good tutorial on login items: https://martiancraft.com/blog/2015/01/login-items/
Btw, https://developer.apple.com/documentation/servicemanagement/smappservice looks interesting.

3:
The approach I'd suggest here that's a bit simpler than using XPC is to setup a user defaults suite that is shared b/w the helper and the main app. The helper app would store which spaces and windows it has encountered in the shared user defaults and the main app would use that as a backup for any spaces it has not yet encountered. Again, I'm not sure if this would be storing a representation that allows for reconstruction of an AXUIElement or just a dictionary of basic window information. Additionally you could persist screenshots to disk and store URLs to those screenshots in the user defaults.

The main app would check to get a list of the current spaces, and check its own memory to see if it has the window/AXUIElement information needed for these spaces and if not check the shared user defaults for information for any spaces it hasn't yet navigated to. In many/most cases, either the main app or helper app would have the information for all spaces on screen, in which case all is good.

In cases where the shared user defaults doesn't already have all information needed from all spaces, then trigger existing solution for getting that data for only the spaces with that missing data. Whatever difficulties exist with that solution, at least this approach should minimize the occurrence of needing to use it.

@lwouis
Copy link
Owner Author

lwouis commented Sep 14, 2022

@brettstover AltTab already launches at login, so it can monitor things in the same capacity as the alternative solution you describe. It's already multi-threaded to avoid blocking. The only downtime is during upgrades where it restarts and lost context. But having a background service wouldn't solve that since that service would need to restart on upgrade as well. So it's more of a topic of serializing the state on disk either way. And we don't do that today because there could be differences before/after and we don't control when AltTab is back. Could be minutes and windows could be shuffled in between.

@stephancasas
Copy link

Hello, all. I'm new to this conversation, so if I'm saying something which has already been suggested, please feel free to let me know. If you're not opposed to continuing use of the private CoreGraphics framework, I've found a trick that works extremely well:

macOS continuously registers a new keyboard accelerator each time a new desktop/space is created. The accelerator isn't activated unless previously enabled by the user in System Preferences. However, using the function CGSSetSymbolicHotkeyEnabled(int, BOOL), you can activate the accelerator programmatically. What's more, since you're calling a method and not writing to a PLIST store, the change is instantly registered by the CoreGraphics keyboard events listener.

Determining what space you want for which window can be done by first querying CGSCopySpacesForWindows(CGSConnectionID id, int spacesMask, CFArrayRef windows). This will give you the CGSpaceID for a CGWindowID.

To resolve the CGSpaceID into the space's ordered index or human-readable desktop number, you can use CGSCopyManagedDisplaySpaces(CGSConnectionID id) to get an ordered list of all spaces. Flat-map the spaces from each CGDisplay entity, and then find the index of your CGSpaceID from the previous step. You now have the zero-based desktop number of the target space, add 1 to turn this into the human-readable desktop number.

Finding the correct hotkey to focus the target space can be done by reading-in the com.apple.symbolichotkeys PLIST. Switching for numbered desktops begins with desktop 1 at index 118. Add the zero-based desktop number to 118 to determine which symbolic hotkey you'll need to enable, and then call CGSSetSymbolicHotkeyEnabled(int, BOOL) to engage the listener — using the PLIST-resolved index as the first arg and YES as the second.

The parameters value of each entity in com.apple.symbolichotkeys.plist is structured as follows:

(
  {{ ascii_value_of_keyboard_glyph_if_applicable }},
  {{ osascript_key_code_of_keyboard_key }},
  {{ bitwise_nxkeymask_of_modifier_keys }}
)

Resolve the NXKeyMask value, into the modifier keys, and then use either System Events or CGEventPost to dispatch the keyboard event. The space will snap into focus, and you can now use AXUIElementCreateApplication to make your desired window frontmost — after which you can activate the target process using [[NSRunningApplication runningApplicationWithProcessIdentifier: {{ PID }}] activateWithOptions: 2].

This solution is working extremely well for me. I'm not an Objective-C programmer, and I pieced this together as best I could through a lot of trial and error, so there may be places where efficiency can improve. As stated before, if I'm missing something, please feel free to point it out to me. I can post a working copy if you'd like to see some code.

@lwouis
Copy link
Owner Author

lwouis commented Nov 30, 2022

@stephancasas this is interesting information. Thank you for sharing.

As i understand, this would allow us to switch to specific Spaces. We can do this already, in a simpler way actually. We have a strategy where we spawn invisible windows in every Space. We can then focus them to force macOS to focus that Space. More info in the last bullet point of this recap.

The issue with switching Space is that it's visible for the user. It disturbs their work when we start going Space by Space to visit.

How fast is your method at visiting Space? Could we call it like 10 times in a row for 10 Spaces, really quickly, so the user sees only a "flash" on-screen?

@stephancasas
Copy link

@lwouis I may have misunderstood the initial issue. Is the aim to find a different way of navigating to a space once a window is selected, or to find a different way of getting thumbnails for windows which are on other spaces?

What I've described would only be useful in the former, not the latter.

@lwouis
Copy link
Owner Author

lwouis commented Nov 30, 2022

@stephancasas this ticket is about the following topic.

When windows are on other Spaces than the active Space, we can't get their AXref, which is the technical structure that lets us do many things with them (e.g. focus them, minimize them, get their title, get their screenshot, etc).

What we were doing before Monterey was to use a private API to instantly teleport all windows on the active Space. Then we would grab their AXref, then we would teleport them back in their original Spaces. From the user perspective, they would open AltTab, and see quick flash on screen, sometimes barely noticeable, then AltTab would list all windows nicely.

The API which teleports windows is broken in Monterey onwards. This ticket investigate alternatives.

We could ask the user to visit all Spaces manually, or we could visit them automatically on launch, but all these solutions make for a bad UX. We are looking for something more invisible to the user, that would let us grab the AXref somehow.

@goulashsoup
Copy link

@lwouis @koekeishiya @metacodes Maybe instead of just focusing on already known private APIs we could analyse the code stack of the macOS dock and see if there are any private APIs that are not discovered yet.

Found these two articles about how to reverse engineer macOS APIs:

@koekeishiya
Copy link

The discussion above did not focus on known APIs; it included looking at basically every symbol exported by the SkyLight.framework, which is the interface to the WindowServer.

I don't remember exactly every detail that was attempted in this discussion, but the core of the issue is:

To focus a window, you need a reference through the AX API.
This is the only way to focus a specific window on macOS (unless you inject code into the Dock, which requires SIP to be disabled).

To get an AX reference for a window, that window must be on a currently visible space.

The workaround in alttab that worked for older versions of macOS was to detect windows using private APIs and move them to the currently active space, so that they would be eligible for usage through the AX API.

@goulashsoup
Copy link

The discussion above did not focus on known APIs; it included looking at basically every symbol exported by the SkyLight.framework

I was not talking known APIs!

To focus a window, you need a reference through the AX API.
This is the only way to focus a specific window on macOS (unless you inject code into the Dock, which requires SIP to be disabled).

Yes I know, you inject code into the Dock App (e.g. in window_manager_focus_window_without_raise right?) if you don't have th axuiref, in Swift its an object of AXUIElement.

I think the Dock App uses an API that we can expose where you use the injection.

@koekeishiya Did you take a look at the assembly code of the Dock App to find out which memory addresses to use?
If yes, did you not find any APIs used that we can expose?

@koekeishiya
Copy link

Did you take a look at the assembly code of the Dock App to find out which memory addresses to use?
If yes, did you not find any APIs used that we can expose?

Yes I did, and no there is no API that does what alttab needs, that work on the newest version of macOS.

--

window_manager_focus_window_without_raise is not code injection; it simply sends bytes to a specific application based on an event protocol that I figured out by instrumenting code using Frida.re. This alone is not enough to fully focus a window, and must be used in combination with the AX API. It is used to work around a bug that makes the AX API not focus the correct window in a multi-monitor setup.

I am not going to go into details here, but basically every GUI application on macOS register themselves with the Dock (this is part of Carbon/Cocoa); setting up an event handler and a mach port for communication. The Dock runs the server part, and applications connect and give the Dock communication rights. The Dock then uses this mach port to signal an application (using the process serial number and window id) to make a specific window the key-window (focused window).
You can hook into this part, but as I said it requires injecting code into the Dock's process space, which requires SIP to be disabled.
I have hooked this function for use in yabai many years ago.

@goulashsoup
Copy link

Yes I did, and no there is no API that does what alttab needs
I am not going to go into details here ...

Well, thats a lot of detail already, thanks 😁

And I suppose we can not "expose" the Dock App source code functions, because this is only possible for shared libraries, right?

Anyway, I want to analyze the Dock App myself, therefor I have to disable SIP also i suppose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request need breakthrough Need a breakthrough idea to move forwards
Projects
None yet
Development

No branches or pull requests

8 participants