Skip to content

AnswerDotAI/playwrightnb

Repository files navigation

PlaywrightNB

PlaywrightNB provides some little quality-of-life helpers for interactive use of the wonderful Playwright library. It’s likely to be particularly of interest to folks using Jupyter.

Install

pip install playwrightnb

Overview

from playwrightnb import *
from html.parser import HTMLParser

playwrightnb provide two main functions: read_page_async(url), and read_page(url). They are identical except the 1st is async.

They return a tuple of the main HTML page contents, and a dict mapping iframe IDs to their HTML contents. They handle Javascript and other trickiness largely automatically, however you can pass a pause parameter (in milliseconds) if you need to insert some manual waits. You can also pass a timeout (also in milliseconds).

For instance, the Dyalog APL help information is provided inside an iframe that’s dynamically loaded by JS, but we are able to read it directly:

sh_url = 'https://help.dyalog.com/19.0/#UserGuide/Installation%20and%20Configuration/Shell%20Scripts.htm'
cts,iframes = read_page(sh_url)

Use h2md to convert the HTML to markdown:

print(h2md(iframes['topic'])[94:250])
## Shell Scripts

Shell scripts are typically executed from a terminal (or shell).

A script is executed by typing its name. User input is entered from the 

In the case where you want to grab some particular element using a CSS selector, use url2md to read the page, find the selector, and convert to markdown. E.g, for accessing Discord’s JS-rendered docs:

url = 'https://discord.com/developers/docs/interactions/application-commands'
sel = '.page-content-scrolling-container'
md = url2md(url, sel)
print(md[856:1215])
Application commands are native ways to interact with apps in the Discord client. There are 3 types of commands accessible in different interfaces: the chat input, a message's context menu (top-right menu or right-clicking in a message), and a user's context menu (right-clicking on a user).

## Application Command Object

###### Application Command Naming

If you don’t need JS-rendering or other fanciness, use get2md instead, which uses httpx.get instead of playwright.