Skip to content

Tracking Issue: Next-gen PHP Importers for Data Liberation #1894

Open
0 of 3 issues completed
Open
@adamziel

Description

@adamziel

Next Gen importers

This issue tracks the work related to Data Liberation Phase 2: Importing and Exporting Structured Data, that is:

  • Parsers
  • Importers
  • User and developer tools.

WordPress needs parsers. Not just any parsers, but parsers that are streaming, re-entrant, fast, standard compliant, and tested using a large body of possible inputs. A seemingly simple task such as moving a post to another website requires rewriting the URLs in that post, downloading the assets, and handling network failures. More complex tasks, such as importing a WXR file or transferring an entire site, are even more demanding.

WordPress also needs importers. Not just any importers, but importers that can handle large quantities of data from multitude of data formats, are extensible, and can proceed even when they encounter an error in the middle of the process. The WP_Stream_Importer class explored in this project is designed to fulfill these goals – see specific PRs below.

Finally, WordPress needs user and developer tools to use these importers. Not just any tools, but tools that work on the web, in CLI, in the Playground, guide the user with useful progress updates, and provide useful recovery paths when the inevitable errors occur. The work tracked here focuses on a wp-admin page, but the PHP software components are designed for easy reuse outside of wp-admin.

Tracking – ongoing Issues and PRs

Parsing

Exporting

Importing

Data formats

Reliability

UI

Other

Related resources

Next phases: Future Data Liberation roadmap

Note

The ideas below are the next phases of the project. They stretch far beyond the medium-term importers work tracked in this issue and only live here to paint the big picture.

  • WXR imports
    • Fork https://github.com/humanmade/WordPress-Importer. Give attribution to the original team, ping them and start a conversation
    • Port it to WP_XML_Tag_Processor
    • Start using that fork for importing WXR files in Playground
    • Rewrite the imported site URLs
    • Use AsyncHTTP\Client for fetching assets
    • Make it resumable if it fails halfway through
    • Report progress information to the user
    • Surface errors to the user, ask how to handle them
    • Use in Blueprints
    • Sort the imported entities in topological order
    • Test with tricky inputs
    • Create WP CLI command
    • Create a good looking wp-admin page
    • Publish it as a standalone plugin to start gathering feedback and bug reports
  • Extensibility
  • Markdown workflow for editing existing documentation sites from GitHub
    • Markdown importer
    • Markdown exporter – migrate @dmsnell's Markdown <-> Block markup TypeScript converter from https://github.com/dmsnell/blocky-formats to PHP
    • Discuss using Playground to edit Playground docs, Gutenberg docs, and potentially all WordPress docs
    • Discuss using it as a drop-in static site generator replacement (e.g. Jekyll)
  • Static block markup editor
  • Reliable Playground ZIP export / import
    • Fork the Sandbox Site plugin
    • Improve the SQL export to make it streamable and ensure there are absolutely no issues with escaping
    • Rewrite the exported and imported site URLs
    • Include extension points to enable custom treatment of any block attribute, database row etc. See one of the GitHub discussions referenced in Kickoff Data Liberation: Let's Build WordPress-first Data Migration Tools #1888
    • Consider shipping .sql files with the export to potentially enable importing the resulting .zip in a regular MySQL-based server environment
    • ...anything else actually?
  • "Duplicate Playground" feature
    • Iteration 1: Pipe the ZIP export to ZIP import
    • Iteration 2: Mount /wordpress-new in the duplicated Playground instance, run the PHP export/import code to migrate the site from /wordpress there
    • Iteration 3: Keep track of progress, make it resumable regardless of when the process is interrupted. This would enable exporting really big sites
  • Direct WordPress <-> WordPress transfer
    • Conceptually, this is like running Duplicate Playground over the internet
    • Important to keep track of progress and resources versions using a vector clock
    • Export / Import UI with scope (users? posts? etc.), error info (image.jpg couldn't be fetched after 3 retries), and error resolution mechanism (specify a different url? upload that image? retry 4th time?)
  • Live WordPress <-> WordPress data sync
    • Run the WordPress <-> WordPress transfer in a continuous way.
    • This is not about collaborative editing in the block editor, although there is likely an overlap around data synchronization.
  • Importers version 2 and beyond
    • Subtasks outlined in [Data Liberation] Entity Stream Importer
    • Import one post at a time, not "all static assets" and then "all posts". Identify each post's dependency graph and frontload that post's dependent data first.
    • Resume .partial assets download upon import pause and resume.
    • Resource quotas

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions