Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit Visually browser extension – iteration 2 #298

Draft
wants to merge 7 commits into
base: trunk
Choose a base branch
from

Conversation

adamziel
Copy link
Collaborator

🚧 A more involved description TBD 🚧

Large changes:

  • Snappy loading
  • WordPress, PHP, and everything needed to run the editor is now shipped with the extension.
  • There's no more iframe and a modal. Instead, there's a separate editor window. Manifest v3 makes the iframe non-feasible.
  • An external domain with a loopback service worker is used to work around Manifest v3 limitation
  • Extension service worker is involved, which breaks Firefox compat (not sure about Safari)

cc @dmsnell

@adamziel adamziel changed the title Edit Visually: browser extension – iteration 2 Edit Visually browser extension – iteration 2 Jun 10, 2024
@brandonpayton
Copy link
Member

I made a note to poke at this tomorrow. I'd love to see more of what you've done here.

Manifest v3 makes the iframe non-feasible.
An external domain with a loopback service worker is used to work around Manifest v3 limitation

What is this limitation?

Extension service worker is involved, which breaks Firefox compat (not sure about Safari)

It is probably possible to use the older background page approach for a manifest v3 Firefox extension, once you are happy with the Chrome version.

Also, IIRC, the offscreen document being used here was created to fill a gap that was left when Chrome switched away from background pages, so I wonder whether the background page in Firefox could serve in the same way. (Or perhaps Firefox also has support for offscreen documents).

@adamziel
Copy link
Collaborator Author

adamziel commented Jun 11, 2024

What is this limitation?

@brandonpayton tl;dr, you either can't eval() and use inline <script> tags on content served from the extension, or, when using a "sandbox" page, you can't have a service worker.

A fuller answer is on these diagrams I drawn while exploring different avenues:

where to run Playground


scene


scene

@adamziel
Copy link
Collaborator Author

adamziel commented Sep 11, 2024

This worked for me for eval and including custom scripts:

HTML:

<iframe src="chrome-extension://hjaafimkfjnfdabpmpcilmpiemgmjcgb/sandbox-page.html?abc" sandbox="allow-same-origin allow-scripts allow-popups allow-forms"></iframe>

Manifest.json:

  "sandbox": {
    "pages": [
      "sandboxed-page.html"
    ]
  },
  "web_accessible_resources": [
    {
      "resources": [ "sandboxed-page.html" ],
      "matches": [ "<all_urls>" ]
    }
  ],
  "content_security_policy": {
    "sandbox": "sandbox allow-scripts allow-forms allow-popups allow-modals; script-src 'self' 'unsafe-inline' 'unsafe-eval'; child-src 'self' https://playground.wordpress.net/; frame-src 'self' https://playground.wordpress.net/;"
  },

Another idea is to postMessage to an external iframe that runs the code and responds with the results. For that, this setup worked for me:

Manifest.json:

    "content_security_policy": {
        "extension_pages": "script-src 'self' 'wasm-unsafe-eval'; default-src 'self';child-src 'self' https://playground.wordpress.net/; frame-src 'self' https://playground.wordpress.net/;"
    },
    "web_accessible_resources": [
        {
            "resources": ["regular-extension-page.html"],
            "matches": ["<all_urls>"]
        }
    ],

regular-extension-page.html:

<iframe
    src="https://playground.wordpress.net/"
    sandbox="allow-same-origin allow-scripts allow-popups allow-forms"
></iframe>

@adamziel
Copy link
Collaborator Author

Here's a bunch of notes I took when exploring this

  • Chrome extensions are extremely restricted in their ability to run JavaScript. Manifest V3 limits running inline scripts, eval(), iframes , and more.
  • Goal: Iframe Gutenberg in Playground within GitHub. Preload it after the page load. Do not preload one wasm module per GitHub tab as that would get very slow very quickly. Instead, only run a single Playground site and point all the iframes there.
  • Manifest V3 limitations
    • No inline <script> tags at all, not even this:
<script integrity="sha256-vv1iqjMZ5xNXCek5kMHs5l1Vmbsbmbn3V/dLMQlI/HQ=" 
        crossorigin="anonymous">
- Can't enable ^ in manifest.json, even with script-src 'sha256-vv1iqjMZ5xNXCek5kMHs5l1Vmbsbmbn3V/dLMQlI/HQ=' – the browser complains about unsecure CSP and refuses to load the extension.
  • Challenges
    • Run a WordPress site.
      • Idea 1: Loading Playground from playground.wordpress.net.

        • The only place where we can run a site instance and keep it alive for a long time seems to be an "offscreen page". It's a special API available to extensions. They have access to DOM APIs.
        • Problem: We can't use the JavaScript client's startPlaygroundWeb() because there's eval() involved in exchanging messages with Playgound. That happens in Comlink.js library, I suspect it's because we pass callback functions around. Perhaps a refactoring would solve this. Anyway.
        • We can put an <iframe src="https://playground.wordpress.net"></iframe> in there, yay! We can even capture the /scope:9087313 path of the site opened in that iframe. Blueprint is passed via the URL fragment.
        • βœ… We can open that site via window.open(siteUrl)
        • ❌ We can't embed that site in an iframe. The service worker correctly receives the request from an iframe, but Chrome somehow prevents passing it from passing a message to the web worker running on the offscreen page.
        • Idea: use the extension message passing mechanism to work around that limitation.
      • Idea 2: Ship Playground with the extension

        • I bundled PHP, WordPress etc with the extension.
        • The extension's service worker handles fetch events in the same way as the service worker on playground.wordpress.net. It's very much like a web server.
        • That service worker handles requests to the extension URL, for example chrome-extension://fbalaiceeeoammejklginmckdfffgeje/wp-admin/post-new.php
        • βœ… WordPress loads when navigating directly to chrome-extension://fbalaiceeeoammejklginmckdfffgeje/wp-admin/post-new.php
          • ❌ No JavaScripts 😞
          • Challenge: WordPress wants a http://–based URL, gets redirects and pages wrong when served from chrome-extension://
          • Solution: Rewrite requests and responses. It's not perfect and there are likely better solutions, but it got us off the ground.
          • ❌ That page won't load inside an <iframe> embedded on another page.
        • βœ… It loaded in a "popup" for me! Could be a good middleground.
        • I got wp-login.php to load in a content script iframe after listing it under web_accessible_resources!
          • ❌ It loaded CSS files!
          • It still complained like this: Refused to apply inline style because it violates the following Content Security Policy directive: "default-src 'self'".
        • βœ… wp-login.php loads via chrome.windows.create
          • ❌ No JavaScripts 😞 Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'self' 'wasm-unsafe-eval' 'inline-speculation-rules' http://localhost:* http://127.0.0.1:*". Either the 'unsafe-inline' keyword, a hash ('sha256-fzaeQ6PuVMQ+o48ATAh9SfYo1nWfwei03GKf11f6yUM='), or a nonce ('nonce-...') is required to enable inline execution.
      • Idea 3: Use sandbox

        • It doesn't seem like it could load extension pages, though, or use a service worker.
        • A sandboxed page won't have access to extension APIs, or direct access to non-sandboxed pages (it may communicate with them using postMessage()).
        • It comes from a non-secure origin so it can't register its own service worker. Clicking WordPress links, thus, would not work. Unless the extension service worker could intercept that somehow.
        • WordPress/wordpress-playground. Branch edit-visually-try-sandbox. Need to push from my laptop on wifi
          Inside the web page:
<iframe src="chrome-extension://hjaafimkfjnfdabpmpcilmpiemgmjcgb/sandbox-page.html?abc" sandbox="allow-same-origin allow-scripts allow-popups allow-forms"></iframe>

Manifest.json:

  "sandbox": {
    "pages": [
      "sandboxed-page.html"
    ]
  },
  "web_accessible_resources": [
    {
      "resources": [ "sandboxed-page.html" ],
      "matches": [ "<all_urls>" ]
    }
  ],
  "content_security_policy": {
    "sandbox": "sandbox allow-scripts allow-forms allow-popups allow-modals; script-src 'self' 'unsafe-inline' 'unsafe-eval'; child-src 'self' https://playground.wordpress.net; frame-src 'self' https://playground.wordpress.net;"
  },
		- The error says `Service worker is disabled because the context is sandboxed and lacks the 'allow-same-origin' flag.`
			- The iframe actually has that sandbox 
			- The manifest won't accept a allow-same-origin flag in content_security_policy > sandbox 😞  See [the doc](https://developer.chrome.com/docs/extensions/reference/manifest/sandbox) page:
				- You can specify your CSP value to restrict the sandbox even further, but it MUST include the "sandbox" directive and MUST NOT have the allow-same-origin token
			- I can, however, use the sandboxed page to iframe https://playground.wordpress.net directly on [GitHub.com](http://github.com/), which otherwise is impossible due to CSP restrictions!
				- The downside: The iframed service worker still won't work because of the same error:
						- `Failed to read the 'serviceWorker' property from 'Navigator': Service worker is disabled because the context is sandboxed and lacks the 'allow-same-origin' flag. `
			- βœ… However, **I can embed Playground on [GitHub.com](http://github.com/) by iframing a regular extension page and iframing Playground in there!**

Manifest.json:

    "content_security_policy": {
        "extension_pages": "script-src 'self' 'wasm-unsafe-eval'; default-src 'self' https://playground-editor-extension.pages.dev/; child-src 'self' https://playground.wordpress.net; frame-src 'self' https://playground.wordpress.net;"
    },
    "web_accessible_resources": [
        {
            "resources": ["regular-extension-page.html"],
            "matches": ["<all_urls>"]
        }
    ],

regular-extension-page.html:

<iframe
    src="https://playground.wordpress.net"
    sandbox="allow-same-origin allow-scripts allow-popups allow-forms"
></iframe>
	- Idea 4: listen to [playground.wordpress.net](http://playground.wordpress.net/) service worker messages and manually pass them to the web worker using chrome extension message passing
		- This is the one that worked out in the end. I just needed a secure origin allowed to register its own service workers.
		- Downside: You must be online before the first time you use the extension, which could be a problem if we try to reuse the same approach in webview-based mobile apps. Maybe, then, we need to skip the WebView and directly run PHP.wasm via a WASI runtime/library in these mobile apps instead?

One last challenge: Only run a single WASM instance of Playground and handle all the requests there. Do not spin a new instance for each site.

Conceptually, all the iframed Playgrounds would have to pass messages to a single worker.

  • Regular content script -> <iframe src="loopback service worker"> -> intercept fetch events via window.onMessage() -> pass to extension service worker -> pass to offscreen page for handling
  • Only this doesn't work, see the diagram:
    image

adamziel added a commit to WordPress/wordpress-playground that referenced this pull request Oct 14, 2024
…ools (#1888)

Let's officially kickoff [the Data
Liberation](https://wordpress.org/data-liberation/) efforts under the
Playground umbrella and unlock powerful new use cases for WordPress.

## Rationale

### Why work on Data Liberation?

WordPress core _really_ needs reliable data migration tools. There's
just no reliable, free, open source solution for:

-   Content import and export
-   Site import and export
- Site transfer and bulk transfers, e.g. mass WordPress -> WordPress, or
Tumblr -> WordPress
-   Site-to-site synchronization

Yes, there's the WXR content export. However, it won't help you backup a
photography blog full of media files, plugins, API integrations, and
custom tables. There are paid products out there, but nothing in core.

At the same time, so many Playground use-cases are **all about moving
your data**. Exporting your site as a zip archive, migrating between
hosts with the [Data Liberation browser
extension](https://github.com/WordPress/try-wordpress/), creating
interactive tutorials and showcasing beautiful sites using [the
Playground
block](https://wordpress.org/plugins/interactive-code-block/),
previewing Pull Requests, building new themes, and [editing
documentation](#1524)
are just the tip of the iceberg.

### Why the existing data migration tools fall short?

Moving data around seems easy, but it's a complex problem – consider
migrating links.

Imagine you're moving a site from
[https://my-old-site.com](https://playground-site-1.com) to
[https://my-new-site.com/blog/](https://my-site-2.com). If you just
moved the posts, all the links would still point to the old domain so
you'll need an importer that can adjust all the URLs in your entire
database. However, the typical tools like `preg_replace` or `wp
search_replace` can only replace some URLs correctly. They won't
reliably adjust deeply encoded data, such as this URL inside JSON inside
an HTML comment inside a WXR export:

The only way to perform a reliable replacement here is to carefully
parse each and every data format and replace the relevant parts of the
URL at the bottom of it. That requires four parsers: an XML parser, an
HTML parser, a JSON parser, a WHATWG URL parser. Most of those tools
don't exist in PHP. PHP provides `json_encode()`, which isn't free of
issues, and that's it. You can't even rely on DOMDocument to parse XML
because of its limited availability and non-streaming nature.

### Why build this in Playground?

Playground gives us a lot for free:

- **Customer-centric environment.** The need to move data around is so
natural in Playground. So many people asked for reliable WXR imports,
site exports, synchronization with git, and the ability to share their
Playground. Playground allows us to get active users and customer
feedback every step of the way.
- **Free QA**. Anyone can share a testing link and easily report any
problems they found. Playground is the perfect environment to get ample,
fast moving feedback.
- **Space to mature the API**. Playground doesn’t provide the same
backward compatibility guarantees as WordPress core. It's easy to
prototype a parser, find a use case where the design breaks down, and
start over.
- **Control over the runtime.** Playground can lean on PHP extensions to
validate our ideas, test them on a simulated slow hardware, and ship
them to a tablet to see how they do when the app goes into background
and the internet is flaky.

Playground enables methodically building spec-compliant software to
create the solid foundation WordPress needs.

## The way there

### What needs to be built?

There's been a lot of [gathering information, ideas, and
tools](https://core.trac.wordpress.org/ticket/60375). This writeup is
based on 10 years worth of site transfer problems, WordPress
synchronization plugins, chats with developers, analyzing existing
codebases, past attempts at data importing, non-WordPress tools,
discussions, and more.

WordPress needs parsers. Not just any parsers, they must be streaming,
re-entrant, fast, standard compliant, and tested using a large body of
possible inputs. The data synchronization tools must account for data
conflicts, WordPress plugins, invalid inputs, and unexpected power
outages. The errors must be non-fatal, retryable, and allow manual
resolution by the user. No data loss, ever. The transfer target site
should be usable as early as possible and show no broken links or images
during the transfer. That's the gist of it.

A number of parsers have already been prototyped. There's even [a draft
of reliable URL rewriting
library](https://github.com/adamziel/site-transfer-protocol). Here's a
bunch of early drafts of specific streaming use-cases:

- [A URL
parser](https://github.com/adamziel/site-transfer-protocol/blob/trunk/src/WP_URL.php)
- [A block markup
parser](https://github.com/adamziel/site-transfer-protocol/blob/trunk/src/WP_Block_Markup_Processor.php)
- [An XML
parser](WordPress/wordpress-develop#6713), also
explored by @dmsnell and @jonsurrell
- [A Zip archive
parser](https://github.com/WordPress/blueprints-library/blob/87afea1f9a244062a14aeff3949aae054bf74b70/src/WordPress/Zip/ZipStreamReader.php)
- [A multihandle HTTP
client](https://github.com/WordPress/blueprints-library/blob/trunk/src/WordPress/AsyncHttp/Client.php)
without curl dependency
- [A MySQL query
parser](WordPress/sqlite-database-integration#157)
started by @zieladam and now explored by @JanJakes
- [A stream chaining
API](adamziel/wxr-normalize#1) to connect all
these pieces

On top of that, WordPress core now has an HTML parser, and @dmsnell have
been exploring a
[UTF-8](WordPress/wordpress-develop#6883)
decoder that would to enable fast and regex-less URL detection in long
data streams.

There are still technical challenges to figure out, such as how to pause
and resume the data streaming. As this work progresses, you'll start
seeing incremental improvements in Playground. One possible roadmap is
shipping a reliable content importer, then reliable site zip importer
and exporter, then cloning a site, and then extends towards
full-featured site transfers and synchronization.

### How soon can it be shipped?

Three points:

* No dates.
* Let's keep building on top of prior work and ship meaningful user
flows often.
* Let's not ship any stable public APIs until the design is mature.

For example, the [Try WordPress
extension](https://github.com/WordPress/try-wordpress/) can already give
you a Playground site, even if you cannot migrate it to another
WordPress site just yet.

**Shipping matters. At the same time, taking the time required to build
rigorous, reliable software is also important**. An occasional early
version of this or that parser may be shipped once its architecture
seems alright, but the architecture and the stable API won't be rushed.
That would jeopardize the entire project. This project aims for a solid
design that will serve WordPress for years.

The progress will be communicated in the open, while maintaining
feedback loops and using the work to ship new Playground features.

## Plans, goals, details

### Next steps

Let's start with building a tool to export and import _a single
WordPress post_. Yes! Just one post. The tricky part is that all the
URLs will have to be preserved.

From there, let's explore the breadth and depth of the problem, e.g.:

* Rewriting links
* Frontloading media files
* Preserving dependent data (post meta, custom tables, etc.)
* Exporting/importing a WXR file using the above
* Pausing and resuming a WXR export/import
* Exporting/importing a full WordPress site as a zip file

Ideally, each milestone will result in a small, readily reusable tool.
For example "paste WordPress post, paste a new site URL, get your post
migrated".

There's an ample body of existing work. Let's keep the existing
codebases (e.g. WXR, site migration plugins) and discussions open in a
browser window during this work. Let's involve the authors of these
tools, ask them questions, ask them for reviews. Let's publish the
progress and the challenges encountered on the way.

### Design goals

- **Fault tolerance** – all the data tools should be able to start,
stop, resume, tolerate errors, accept alternative data from the user,
e.g. media files, posts etc.
- **WordPress-first** – let's build everything in PHP using WordPress
naming conventions.
- **Compatibility** – Every WordPress version, PHP version (7.2+, CLI),
and Playground runtime (web, CLI, browser extension, desktop app, CI
etc.) should be supported.
- **Dependency-free** – No PHP extensions required. If this means we
can't rely on cUrl, then let's build an HTTP client from scratch. Only
minimal Composer dependencies allowed, and only when absolutely
necessary.
- **Simplicity** – no advanced OOP patterns. Our role model is
[WP_HTML_Processor](https://developer.wordpress.org/reference/classes/wp_html_processor/)
– a **single class** that can parse nearly all HTML. There's no "Node",
"Element", "Attribute" classes etc. Let's aim for the same here.
- **Extensibility** – Playground should be able to benefit from, say,
WASM markdown parser even if core WordPress cannot.
- **Reusability** – Each library should be framework-agnostic and usable
outside of WordPress. We should be able to use them in WordPress core,
WP-CLI, Blueprint steps, Drupal, Symfony bundles, non-WordPress tools
like https://github.com/adamziel/playground-content-converters, and even
in Next.js via PHP.wasm.


### Prior art

Here's a few codebases that needs to be reviewed at minimum, and brought
into this project at maximum:

- URL rewriter: https://github.com/adamziel/site-transfer-protocol
- URL detector :
WordPress/wordpress-develop#7450
- WXR rewriter: https://github.com/adamziel/wxr-normalize/
- Stream Chain: adamziel/wxr-normalize#1
- WordPress/wordpress-develop#5466
- WordPress/wordpress-develop#6666
- XML parser: WordPress/wordpress-develop#6713
- Streaming PHP parsers:
https://github.com/WordPress/blueprints-library/tree/trunk/src/WordPress
- Zip64 support (in JS ZIP parser):
#1799
- Local Zip file reader in PHP (seeks to central directory, seeks back
as needed):
https://github.com/adamziel/wxr-normalize/blob/rewrite-remote-xml/zip-stream-reader-local.php
- WordPress/wordpress-develop#6883
- Blocky formats – Markdown <-> Block markup WordPress plugin:
https://github.com/dmsnell/blocky-formats
- Sandbox Site plugin that exports and imports WordPress to/from a zip
file:
https://github.com/WordPress/playground-tools/tree/trunk/packages/playground
- WordPress + Playground CLI setup to import, convert, and exporting
data: https://github.com/adamziel/playground-content-converters
- Markdown -> Playground workflow _and WordPress plugins_:
https://github.com/adamziel/playground-docs-workflow
- _Edit Visually_ browser extension for bringing data in and out of
Playground: WordPress/playground-tools#298
- _Try WordPress_ browser extension that imports existing WordPress and
non-WordPress sites to Playground:
https://github.com/WordPress/try-wordpress/
- Humanmade WXR importer designed by @rmccue:
https://github.com/humanmade/WordPress-Importer

### Related resources

- [Site transfer protocol](https://core.trac.wordpress.org/ticket/60375)
- [Existing data migration
plugins](https://core.trac.wordpress.org/ticket/60375#comment:32)
- WordPress/data-liberation#74
- #1524
- WordPress/gutenberg#65012

### The project structure

The structure of the `data-liberation` package is an open exploration
and will change multiple times. Here's what it aims to achieve.

**Structural goals:**

- Publish each library as a separate Composer package
- Publish each WordPress plugin separately (perhaps a single plugin
would be the most useful?)
- No duplication of libraries between WordPress plugins
- Easy installation in Playground via Blueprints, e.g. no `composer
install` required
- Compatibility with different Playground runtimes (web, CLI) and
versions of WordPress and PHP

**Logical parts**

- First-party libraries, e.g. streaming parsers
- WordPress plugins where those libraries are used, e.g. content
importers
- Third party libraries installed via Composer, e.g. a URL parser

**Ideas:**

- Use Composer dependency graph to automatically resolve dependencies
between libraries and WordPress plugins
- or use WordPress "required plugins" feature to manage dependencies
- or use Blueprints to manage dependencies


cc @brandonpayton @bgrgicak @mho22 @griffbrad @akirk @psrpinto @ashfame
@ryanwelcher @justintadlock @azaozz @annezazu @mtias @schlessera
@swissspidy @eliot-akira @sirreal @obenland @rralian @ockham
@youknowriad @ellatrix @mcsf @hellofromtonya @jsnajdr @dawidurbanski
@palmiak @JanJakes @luisherranz @naruniec @peterwilsoncc @priethor @zzap
@michalczaplinski @danluu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants