-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data Liberation] WP_Stream_Importer: User-driven incremental import #2013
Conversation
Exploratory PR to keep track of the import state so that, upon crash, the next run may seamlessly resume where the previous one left off.
…dest entity whose downloads were finalized
…()/seek() methods
stage. Identify downloaded resources by their URL.
…db; Add UI to browse it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An excellent step forward, Adam, I like it. Using custom-type posts is a good one. I am ok with merging this. I just left a couple of comments.
packages/playground/data-liberation/tests/WPStreamImporterTests.php
Outdated
Show resolved
Hide resolved
break; | ||
} | ||
|
||
$post_id = wp_insert_post( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using custom type posts is a great idea. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I was looking for a way to reuse as much of what we already have as possible. A custom table crossed my mind, and we still might need one for the vector clock, but for managing metadata post types and meta seem perfect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In the near future, we will probably need a place to save binary data and similar data. But now, using custom types posts for this is perfectly fine.
Adds wp-admin support for incrementally importing data from WXR files:
This is a part of #1894
Implementation details
There can be one active import session at any given time. It is started by uploading a WXR file, specifying the URL, and can be extended to any number of data sources. Once created, the admin page shows the current import progress. This PR adds a
WP_Import_Session
model class to store the progress information and the current import cursor.Given an active importing session, the admin page will show the current stage and the number of imported entities accompanied by a "Continue Importing" button. When pressed, it calls
WP_Stream_Importer::next_step()
one or more times to perform a small unit of work. After each call, we collect the progress information fromWP_Stream_Importer
– be it the number of downloaded asset bytes, the number of inserted database records, the current importing cursor, etc.next_step()
returns true when some progress was made, even if that was a failed image download attempt. It returns false when it reaches the end of the current importing stage, at which point theadvance_to_next_stage()
method must be called.After each
next_step()
oradvance_to_next_stage()
call, theWP_Stream_Importer::get_reentrancy_cursor()
returns a string that can be used to create a new importer that will resume from the exact same place. The cursor means we got this far, not we got this far and no further. The record the cursor points to may have already been processed. In the upcoming PRs we'll need to either point to the next entity, or invent an idempotent import semantics where processing the same record twice leads to the same outcome as processing it once.Resource Budgets
This PR starts exploring resource budgets by introducing a soft time limit and a minimum number of files downloaded during a single frontloading session. We don't support partial download and resuming yet, so we can't settle for downloading less than one file. On the next attempt we'd just discard the result and likely download less than one file again, meaning we would never get past the frontloading step.
Testing instructions
cd packages/playground/data-liberation/tests/import
bash run.sh
packages/playground/data-liberation/tests/wxr/a11y-unit-test-data.xml