Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create no-embbed option for exporting issues #4799

Open
wants to merge 5 commits into
base: stable-3_3_0
Choose a base branch
from

Conversation

Godoy0722
Copy link
Contributor

Description: I added a --no-embbed option for the CLI and the system interface. It reduces the memory usage, as for faster responses and lighter XML files. It solves the issue #9769.

@jonasraoni jonasraoni linked an issue Apr 1, 2025 that may be closed by this pull request
Copy link
Contributor

@jonasraoni jonasraoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Godoy0722 I've left some comments.

One very important thing that's missing is the way back. I mean, it should be possible to import this new format.

Please, also take a look if there are no more <embed>s in the other applications (OPS and OMP).

{/fbvFormSection}

{fbvFormSection list="true" title="plugins.importexport.native.exportOptions"}
{fbvElement type="checkbox" id="no-embed" name="no-embed" label="plugins.importexport.native.noEmbedOption" checked=$noEmbed|default:false}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you've included the option on the UI, then it makes sense to include the other alternative, it should be a radio (embed | relative | URL).

@@ -135,7 +135,7 @@
<element name="original_file_name" type="string" minOccurs="1" maxOccurs="1" />
<element name="date_uploaded" type="date" minOccurs="1" maxOccurs="1" />
<element name="date_modified" type="date" minOccurs="1" maxOccurs="1" />
<element name="embed" type="pkp:embed" />
<element name="embed" type="pkp:embed" minOccurs="0" maxOccurs="1" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Include the href element (min 0/max 1), with the attributes href and mime_type.
  • The schema should specify that there should be one <href> OR one <embed>

p.s.: There's another occurrence of this update.

$embedNode = $doc->createElementNS($deployment->getNamespace(), 'embed', base64_encode(file_get_contents($filePath)));
$embedNode->setAttribute('encoding', 'base64');
$issueFileNode->appendChild($embedNode);
if (empty($this->opts['no-embed'])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wherever you added support for the no-embed, there should be also the URL (I didn't check, but I assume it's possible), just to keep the standard.

$request = Application::get()->getRequest();
$baseUrl = $request->getBaseUrl();
$context = $deployment->getContext();
$fileUrl = $baseUrl . '/' . $context->getPath() . '/public/' . $coverImage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This public path isn't standard, you need to use the class PublicFileManager.

@Godoy0722
Copy link
Contributor Author

@jonasraoni I just made the requested updates. I'll also take a look at OMP and OPS and will open PR forms for them if it's possible to change it there. I also see that my validation was not used anywhere - I forgot to implement the validation, but it's not required, so I removed it.

Comment on lines +237 to +238
$imageOption = $request->getUserVar('image-option');
switch ($imageOption) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just about images, so it's better to use a generic name

Suggested change
$imageOption = $request->getUserVar('image-option');
switch ($imageOption) {
$serializationMode = $request->getUserVar('serializationMode');

Comment on lines +134 to +135
$issueFileNode->appendChild($node = $doc->createElementNS($deployment->getNamespace(), 'date_uploaded', $dateUploaded));
$issueFileNode->appendChild($node = $doc->createElementNS($deployment->getNamespace(), 'date_modified', $dateModified));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand these changes were motivated because the strftime is deprecated. But as there's a polyfill, it's better to revert. I mean, we are trying to decrease the amount of updates on stable branches.

Also, this whole branch is going to die in 1 year after the 3.5 is released, so it's not even worth to fix minor problems (e.g. /* @var type */ instead of /** @var type */).

Suggested change
$issueFileNode->appendChild($node = $doc->createElementNS($deployment->getNamespace(), 'date_uploaded', $dateUploaded));
$issueFileNode->appendChild($node = $doc->createElementNS($deployment->getNamespace(), 'date_modified', $dateModified));
$issueFileNode->appendChild($node = $doc->createElementNS($deployment->getNamespace(), 'date_uploaded', strftime('%Y-%m-%d', strtotime($issueFile->getDateUploaded()))));
$issueFileNode->appendChild($node = $doc->createElementNS($deployment->getNamespace(), 'date_modified', strftime('%Y-%m-%d', strtotime($issueFile->getDateModified()))));

Comment on lines +124 to +132
$dateUploaded = $issueFile->getDateUploaded();
if (is_string($dateUploaded)) {
$dateUploaded = substr($dateUploaded, 0, 10); // Extract YYYY-MM-DD
}

$dateModified = $issueFile->getDateModified();
if (is_string($dateModified)) {
$dateModified = substr($dateModified, 0, 10); // Extract YYYY-MM-DD
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$dateUploaded = $issueFile->getDateUploaded();
if (is_string($dateUploaded)) {
$dateUploaded = substr($dateUploaded, 0, 10); // Extract YYYY-MM-DD
}
$dateModified = $issueFile->getDateModified();
if (is_string($dateModified)) {
$dateModified = substr($dateModified, 0, 10); // Extract YYYY-MM-DD
}

@@ -327,7 +357,7 @@ function usage($scriptName) {
* @see PKPImportExportPlugin::executeCLI()
*/
function executeCLI($scriptName, &$args) {
$opts = $this->parseOpts($args, ['no-embed', 'use-file-urls']);
$opts = $this->parseOpts($args, ['no-embed', 'use-file-urls', 'use-absolute-urls', 'image-option:']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confuse, what should happen if I add use-file-urls and use-absolute-urls? I think it's better to just add the serializationMode, and on the main branch we can get rid of the other arguments (or mark them as deprecated).

Suggested change
$opts = $this->parseOpts($args, ['no-embed', 'use-file-urls', 'use-absolute-urls', 'image-option:']);
$opts = $this->parseOpts($args, ['no-embed', 'use-file-urls', 'serializationMode:']);

Comment on lines +239 to +251
case 'embed':
break;
case 'relative':
$opts['use-file-urls'] = true;
break;
case 'url':
$opts['use-file-urls'] = true;
$opts['use-absolute-urls'] = true;
break;
default:
if ($request->getUserVar('no-embed')) {
$opts['no-embed'] = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding all these options, we can have just one, the serializationMode

Suggested change
case 'embed':
break;
case 'relative':
$opts['use-file-urls'] = true;
break;
case 'url':
$opts['use-file-urls'] = true;
$opts['use-absolute-urls'] = true;
break;
default:
if ($request->getUserVar('no-embed')) {
$opts['no-embed'] = true;
}

Comment on lines +291 to +320
case 'file_url':
if (!isset($coverImage['uploadName'])) {
$deployment->addWarning(ASSOC_TYPE_PUBLICATION, $object->getId(), __('plugins.importexport.common.error.coverImageNameUnspecified'));
break;
}

import('classes.file.PublicFileManager');
$publicFileManager = new PublicFileManager();
$filePath = $publicFileManager->getContextFilesPath($context->getId()) . '/' . $coverImage['uploadName'];
$allowedFileTypes = ['gif', 'jpg', 'png', 'webp'];
$extension = pathinfo(strtolower($filePath), PATHINFO_EXTENSION);
if (!in_array($extension, $allowedFileTypes)) {
$deployment->addWarning(ASSOC_TYPE_PUBLICATION, $object->getId(), __('plugins.importexport.common.error.invalidFileExtension'));
break;
}

$imageUrl = $n->textContent;
if (empty($imageUrl)) {
$deployment->addWarning(ASSOC_TYPE_PUBLICATION, $object->getId(), __('plugins.importexport.common.error.missingFileUrl'));
break;
}

$fileContents = file_get_contents($imageUrl);
if ($fileContents === false) {
$deployment->addWarning(ASSOC_TYPE_PUBLICATION, $object->getId(), __('plugins.importexport.common.error.failedDownloadingFile', array('url' => $imageUrl)));
break;
}

file_put_contents($filePath, $fileContents);
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole piece can be removed, once you copy the implementation for the <href>

Suggested change
case 'file_url':
if (!isset($coverImage['uploadName'])) {
$deployment->addWarning(ASSOC_TYPE_PUBLICATION, $object->getId(), __('plugins.importexport.common.error.coverImageNameUnspecified'));
break;
}
import('classes.file.PublicFileManager');
$publicFileManager = new PublicFileManager();
$filePath = $publicFileManager->getContextFilesPath($context->getId()) . '/' . $coverImage['uploadName'];
$allowedFileTypes = ['gif', 'jpg', 'png', 'webp'];
$extension = pathinfo(strtolower($filePath), PATHINFO_EXTENSION);
if (!in_array($extension, $allowedFileTypes)) {
$deployment->addWarning(ASSOC_TYPE_PUBLICATION, $object->getId(), __('plugins.importexport.common.error.invalidFileExtension'));
break;
}
$imageUrl = $n->textContent;
if (empty($imageUrl)) {
$deployment->addWarning(ASSOC_TYPE_PUBLICATION, $object->getId(), __('plugins.importexport.common.error.missingFileUrl'));
break;
}
$fileContents = file_get_contents($imageUrl);
if ($fileContents === false) {
$deployment->addWarning(ASSOC_TYPE_PUBLICATION, $object->getId(), __('plugins.importexport.common.error.failedDownloadingFile', array('url' => $imageUrl)));
break;
}
file_put_contents($filePath, $fileContents);
break;

@@ -284,11 +378,93 @@ function parseIssueCover($filter, $node, $object) {

file_put_contents($filePath, base64_decode($n->textContent));
break;
case 'href':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to the previous modified code block, so the same comments can be applied

Comment on lines +448 to +469
/**
* Get MIME type based on file extension
* @param $fileName string
* @return string MIME type
*/
protected function _getMimeTypeFromFileName($fileName) {
$extension = strtolower(pathinfo($fileName, PATHINFO_EXTENSION));
switch ($extension) {
case 'jpg':
case 'jpeg':
return 'image/jpeg';
case 'png':
return 'image/png';
case 'gif':
return 'image/gif';
case 'webp':
return 'image/webp';
default:
return 'application/octet-stream';
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have such method already in the codebase, but before using it, check if the application doesn't provide it through a getMimeType() (or something like that) method.

Suggested change
/**
* Get MIME type based on file extension
* @param $fileName string
* @return string MIME type
*/
protected function _getMimeTypeFromFileName($fileName) {
$extension = strtolower(pathinfo($fileName, PATHINFO_EXTENSION));
switch ($extension) {
case 'jpg':
case 'jpeg':
return 'image/jpeg';
case 'png':
return 'image/png';
case 'gif':
return 'image/gif';
case 'webp':
return 'image/webp';
default:
return 'application/octet-stream';
}
}

Comment on lines +83 to +88
msgid "plugins.importexport.common.error.missingFileUrl"
msgstr "The file_url element does not contain a URL."

msgid "plugins.importexport.common.error.failedDownloadingFile"
msgstr "Failed to download file from URL: {$url}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed once you copy the other implementation.

Suggested change
msgid "plugins.importexport.common.error.missingFileUrl"
msgstr "The file_url element does not contain a URL."
msgid "plugins.importexport.common.error.failedDownloadingFile"
msgstr "Failed to download file from URL: {$url}"

@@ -125,3 +134,27 @@ msgid "plugins.importexport.native.import.error.publishedDateMissing"
msgstr ""
"The article \"{$articleTitle}\" is contained within an issue, but has no "
"published date."

msgid "plugins.importexport.common.export"
msgstr "Export"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic notes for the locale keys:

  • Check if generic texts already exist on the localization files and re-use them, this one does exist for example as common.export.
  • In general, try to not create new locale keys on old branches, these are probably not going to be translated (due to problems in how we manage the translations across all the branches, the old branches are currently left to rot).
  • If the locale key is going to be re-used in other applications, create them on the pkp-lib submodule, then their translation will be shared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Native XML consumes too much memory
2 participants