Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Liberation] WP_Stream_Importer with support for WXR and Markdown files #1982

Merged
merged 47 commits into from
Nov 18, 2024
Merged
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
4f1bb29
Add Markdown -> Blocks reader
adamziel Nov 4, 2024
65728f3
Write Playground CLI errors to stdout, don't reuse partial downloads
adamziel Nov 4, 2024
9588634
Adjust WXR reader data shape
adamziel Nov 4, 2024
51e7900
Add WP_Entity_Importer class
adamziel Nov 4, 2024
83f35f1
Add test scripts
adamziel Nov 4, 2024
b5bdbb3
Lint
adamziel Nov 4, 2024
141cdfa
Experiment: Source markdown files from a nested directory tree
adamziel Nov 4, 2024
c46d973
Parse frontmatter, try importing all the doc pages into WordPress usi…
adamziel Nov 4, 2024
48ee5a0
Decently working markdown importing
adamziel Nov 5, 2024
58150bb
Assign GUIDs
adamziel Nov 5, 2024
ddf2406
Use local path as GUID
adamziel Nov 5, 2024
f787f6a
WP_Markdown_Directory_Tree_Reader to load markdown from a directory tree
adamziel Nov 7, 2024
f9ba8cb
Rename Markdown HTML API
adamziel Nov 7, 2024
cd80834
Move specialized APIs to their own subfolders
adamziel Nov 7, 2024
b855aef
Move imported entity data to a dedicated WP_Imported_Entity class
adamziel Nov 7, 2024
0f2f428
Explore imperative connection of byte stream to wxr reader
adamziel Nov 7, 2024
3878998
Rewrite URLs in the imported content
adamziel Nov 7, 2024
40f0028
Prototype entity import with image downloading and URL rewriting
adamziel Nov 8, 2024
37cdb76
Document how two passes are needed
adamziel Nov 8, 2024
6dd046f
Add a comment to do two passes on markdown
adamziel Nov 8, 2024
75427b3
Fetch attachments when importing a markdown file
adamziel Nov 13, 2024
acce0b4
Remove unused code
adamziel Nov 13, 2024
560252c
Further document the two-pass approach
adamziel Nov 13, 2024
5fb3073
Support multiple URLs in wp_rewrite_urls
adamziel Nov 13, 2024
26652a9
Two pass Markdown import, move URL rewriting logic to WP_Block_Markup…
adamziel Nov 14, 2024
1c74f5a
Create attachments when importing markdown files
adamziel Nov 14, 2024
a9205e1
Harmonize WXR import logic with Markdown import
adamziel Nov 15, 2024
8087f3e
Frontload assets during WXR import
adamziel Nov 15, 2024
d2c71d2
Support importing parent IDs
adamziel Nov 15, 2024
d01be29
First stab at a common abstraction for streaming content importers
adamziel Nov 15, 2024
e5b79cc
Sort out relative vs absolute vs base URL nuances to get the common W…
adamziel Nov 15, 2024
7ac2133
Remove addressed todos, expand docstrings
adamziel Nov 15, 2024
82bd95e
Rename "from_stream" and "from_strig" to "create_for_streaming" and "…
adamziel Nov 15, 2024
81c47f7
Move Stream_Importer and Markdown_Importer to separate files
adamziel Nov 16, 2024
e6b713b
Simplify the implode() call
adamziel Nov 16, 2024
d0cf271
Adjust docstrings
adamziel Nov 16, 2024
a376f05
Use empty list of kses attributes in tests
adamziel Nov 16, 2024
94ea43d
Update docstrings
adamziel Nov 16, 2024
97da630
Update docstrings
adamziel Nov 16, 2024
e55106e
Adjust type in docs
adamziel Nov 16, 2024
5238736
Explore re-entrancy – implement more byte readers to see what pattern…
adamziel Nov 17, 2024
bf4b53d
Replace StreamChain with much simpler connect_upstream() method.
adamziel Nov 17, 2024
9c01981
Restore functional URL rewriting
adamziel Nov 18, 2024
87b5d28
Add inline comment to resume()
adamziel Nov 18, 2024
b977619
Lint
adamziel Nov 18, 2024
ac62590
Exclude WXR_Importer.php from phpcs
adamziel Nov 18, 2024
9d68196
Use the correct case in require statement in bootstrap.php
adamziel Nov 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Explore imperative connection of byte stream to wxr reader
adamziel committed Nov 7, 2024
commit 0f2f4283ebd7f0385a6139ab5a91fec7f908165e
16 changes: 16 additions & 0 deletions packages/playground/data-liberation/plugin.php
Original file line number Diff line number Diff line change
@@ -30,6 +30,22 @@
}
return;

$wxr_reader = WP_WXR_Reader::from_stream();
$bytes = new WP_File_Byte_Stream($wxr_path);
while($bytes->next_bytes() && !$wxr_reader->is_finished()) {
$wxr_reader->append_bytes($bytes->get_bytes());
if($bytes->is_finished()) {
$wxr_reader->input_finished();
}
while($wxr_reader->next_entity()) {
$importer->import_entity($reader->get_entity());
}
if($wxr_reader->get_last_error()) {
var_dump($wxr_reader->get_last_error());
die();
}
}

// $wxr_path = __DIR__ . '/tests/fixtures/wxr-simple.xml';
// $wxr_path = __DIR__ . '/tests/wxr/woocommerce-demo-products.xml';
$wxr_path = __DIR__ . '/tests/wxr/a11y-unit-test-data.xml';