Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Múltiple scrollable elements #276

Open
javuiz opened this issue Dec 4, 2024 · 2 comments
Open

Múltiple scrollable elements #276

javuiz opened this issue Dec 4, 2024 · 2 comments

Comments

@javuiz
Copy link

javuiz commented Dec 4, 2024

Some pages' main scroll bar is not at the main window, but in some inner div.

Other pages may even have many scrollable elements, the main window being one of them or not.

It seems all scrolls made in the API calls to read more chunks assume a single scroll on the main window (e.g. when counting chunks and scrollToHeight()).

This makes all content inside scrollable areas other than the main window invisible to the API.

Additionally, scrollable elements are not eligible as "interactive" if not having such role attribute (which is not very common practice afaik) thus cannot be included in the DOM elements provided as context to the LLM. That means one cannot expect the LLM to decide to scroll them on its own, as it's unaware of them.

@vladionescu
Copy link

vladionescu commented Dec 13, 2024

It would be nice if stagehand could try emitting scrollwheel events. Stagehand (and LLM output code) prefers window.scrollTo() which doesn't work in complex DOMs as mentioned here.

Stagehand could check if window height == viewport height: try scrollwheel.

This does work, but it would be good for Stagehand to reach for this tool as needed instead of having to code it statically.

await stagehand.page.mouse.wheel(0, 200);

@vladionescu
Copy link

Patching the handler to add new tools is possible. This one for example enables Stagehand to scroll with the mousewheel. A few usage notes:

  1. This patching function must be called after stagehand.init()
  2. In prompts to the LLM, you must tell it about the new ScrollDownALittle tool
/*
    Monkeypatching actHandler._performPlaywrightMethod() Stagehand method that handles tool use calls from LLM responses
    This adds a 'scrollDownALittle' tool that emits mousewheel events and works on dynamic SPAs

    You need to tell the LLM (prompt) about this new tool or it won't use it
*/
function patchScrollBehavior(stagehand: any) {
    // Get the act handler instance
    const actHandler = Reflect.get(stagehand, 'actHandler');
    if (!actHandler) {
        throw new Error('Could not access actHandler');
    }
    const proto = Object.getPrototypeOf(actHandler);
    const originalMethod = proto._performPlaywrightMethod;

    // Monkeypatch to add a tool
    proto._performPlaywrightMethod = async function(
        method: string,
        args: unknown[],
        xpath: string,
        domSettleTimeoutMs?: number
    ) {
        if (method === 'scrollDownALittle') {
            const viewport = await this.stagehand.page.viewportSize();
            const scroll_y = viewport.height * 0.9;
            
            await this.stagehand.page.mouse.wheel(0, scroll_y);
            
            await this.waitForSettledDom(domSettleTimeoutMs);
            return;
        }

        // Passthrough any other tool calls to the original implementation
        return originalMethod.call(this, method, args, xpath, domSettleTimeoutMs);
    };
}

Usage example:

await stagehand.init();
patchScrollBehavior(stagehand);  // <-------------

await stagehand.page.goto("http://localhost/");
await stagehand.act({ action: "Scroll to the bottom of the page. Only use 'scrollDownALittle' for this." });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants