Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore Data Liberation exporter #2078

Draft
wants to merge 6 commits into
base: trunk
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions packages/playground/data-liberation/bootstrap.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
require_once __DIR__ . '/blueprints-library/src/WordPress/AsyncHttp/HttpError.php';
require_once __DIR__ . '/blueprints-library/src/WordPress/AsyncHttp/Connection.php';
require_once __DIR__ . '/blueprints-library/src/WordPress/AsyncHttp/Client.php';
require_once __DIR__ . '/blueprints-library/src/WordPress/Zip/ZipStreamWriter.php';
require_once __DIR__ . '/blueprints-library/src/WordPress/Zip/ZipFileEntry.php';
require_once __DIR__ . '/blueprints-library/src/WordPress/Zip/ZipCentralDirectoryEntry.php';
require_once __DIR__ . '/blueprints-library/src/WordPress/Zip/ZipEndCentralDirectoryEntry.php';

require_once __DIR__ . '/src/byte-readers/WP_Byte_Reader.php';
require_once __DIR__ . '/src/byte-readers/WP_File_Reader.php';
Expand Down Expand Up @@ -64,6 +68,7 @@
require_once __DIR__ . '/src/import/WP_Entity_Iterator_Chain.php';
require_once __DIR__ . '/src/import/WP_Retry_Frontloading_Iterator.php';
require_once __DIR__ . '/src/import/WP_Markdown_Importer.php';
require_once __DIR__ . '/src/export/WP_Exporter.php';

require_once __DIR__ . '/src/utf8_decoder.php';

Expand Down
12 changes: 12 additions & 0 deletions packages/playground/data-liberation/plugin.php
Original file line number Diff line number Diff line change
Expand Up @@ -657,3 +657,15 @@ function () {
);
}
);

add_action('wp_loaded', 'data_liberation_maybe_test_export');
function data_liberation_maybe_test_export() {
$request_path = parse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);
if( $request_path !== '/_data_liberation_test_export' ) {
return;
}

$exporter = new WP_Exporter();
$exporter->stream_export();
die();
}
72 changes: 72 additions & 0 deletions packages/playground/data-liberation/src/export/WP_Exporter.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<?php

use WordPress\Zip\ZipStreamWriter;

class WP_Exporter {
public static function stream_export( $output_stream = false ) {
// @TODO: This is a hack. Maybe we should have a way to export without setting headers.
$preexisting_response_headers = headers_list();

require_once ABSPATH . 'wp-admin/includes/export.php';
ob_start();
export_wp();

// @TODO: This is a hack to avoid headers set by export_wp(). Maybe we should have a way to export without setting headers.
header_remove();
foreach ( $preexisting_response_headers as $header ) {
header( $header, false );
}

$wxr_content = ob_get_clean();

// @TODO: Replace upload URLs with relative file URLs.

$uploads = wp_upload_dir();
// @TODO: This is a hack and kind of broken. Replace attachment URLs using proper XML and URL parsing libraries.
$wxr_content = str_replace(
trailingslashit( $uploads['baseurl'] ),
'file://./wp-content/uploads/',
$wxr_content
);

header('Content-Type: application/zip');

// @TODO: Can we get rid of this open-stdout-on-demand workaround?
// NOTE: Opening stdout on demand after output buffering the export
// because output buffering seemed to interfere with a preexisting stdout stream.
// By opening stdout after output buffering, streaming the zip to stdout appears to work.
if ( !$output_stream ) {
$output_stream = fopen('php://output', 'wb');
}
$zip_writer = new ZipStreamWriter( $output_stream );
Copy link
Collaborator

@adamziel adamziel Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try packaging the data using the same WP_Entity objects as the importer. We could then have a single streaming export pipeline that knows how to deal with entities on one end, and uses an arbitrary export drivers on the other end, e.g. WXR, Markdown, HTML, etc.

Even more importantly, we could serialize the exported entities, send them over the wire, and import without using any particular data format. That's important for site sync protocol and for things like the Try WordPress extension. Plus we could extend it to more data types, e.g. SQL dumps, Blueprint steps, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel, I don't know exactly what this means yet but will look at the importer work for reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke with Adam, and what we are talking about is basically making a WP_Entity iterator API that can be used to read WP entities from a site. Then the entity iterator API can be used to implement multiple exporters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan for this PR is to just tweak URL replacement to work properly and then leave open as a draft until it can be replaced with a proper exporter based on the entity iterator.

$zip_writer->writeFileFromString( 'META-INF/export.wxr', $wxr_content );

$uploads_path = $uploads['basedir'];

$flags = \FilesystemIterator::SKIP_DOTS;
$uploads_iterator = new \RecursiveIteratorIterator(
new \RecursiveDirectoryIterator(
$uploads_path,
$flags
)
);

foreach ( $uploads_iterator as $file ) {
if ( $file->isDir() ) {
continue;
}
$absolute_path = $file->getPathname();
$relative_path = substr( $absolute_path, strlen($uploads_path) + 1 );
$zip_writer->writeFileFromPath(
// TODO: How to handle unconventional upload locations?
"wp-content/uploads/$relative_path",
$absolute_path
);

// TODO: Is this necessary to make sure per-file output is flushed?
fflush( $output_stream );
}

$zip_writer->finish();
}
}
Loading