Replies: 2 comments
-
Hey Joshy, thanks for taking a look! The biggest issue I had recently was that some platforms generate (or at least allow) pretty bad markup, and TextFormatter doesn't like that. Things like "unclosed" I got pretty far down the markup parsing rabbit hole last spring (which has now fully left my brain) and I think I still have some other unresolved issues but I'll raise them when I find them again and try to get them under test. I think Vanilla to Flarum is probably as difficult a formatting migration as exists, because Vanilla is probably the most permissive formatting system and Flarum is the least (that I've seen, anyway). Vanilla supports something like 7 formats and does very little validation in most of them. I can see using TextFormatter more generically across migrations; I could add a column for XML to the transitional schema as a first step. I think it would be risky to try to move all the source packages to use that system at once, but open to PRs that start to move in that direction. |
Beta Was this translation helpful? Give feedback.
-
In general, I'd recommend starting with the original input rather than whatever HTML the source software outputs. Most of the time, it's easier to create a bundle that accepts the same markup as a given software so you can generate some clean XML. Then you can modify the XML to either tweak some values, or replace the markup. For example, once everything is parsed, you can replace the values used in If you point me to a page that lists all of Vanilla's (for example) markup, I can probably whip up a bundle that accepts most of it. As far as self-closing tags go, they are supported by the HTMLElements plugin. For instance: $configurator = new s9e\TextFormatter\Configurator;
$configurator->HTMLElements->allowElement('br');
$configurator->HTMLElements->allowElement('img');
$configurator->HTMLElements->allowAttribute('img', 'src');
extract($configurator->finalize());
$text = '<img src=img.png><br><img src=x.gif /><br />';
$xml = $parser->parse($text);
$html = $renderer->render($xml);
die("$html\n"); <img src="img.png"><br><img src="x.gif"><br> The library comes with a set of rules pre-programmed. Many of the rules ensure that the output is valid HTML. You can output invalid HTML if you configure it that way. For example: $configurator = new s9e\TextFormatter\Configurator;
$configurator->HTMLElements->allowElement('div');
$configurator->HTMLElements->allowElement('span');
// The HTMLElements plugin creates tags with a html: prefix
$configurator->tags['html:span']->rules->allowChild('html:div');
// Alternatively, allow everything, everywhere even when it breaks stuff
//$configurator->rulesGenerator->add('AllowAll');
extract($configurator->finalize());
$text = '<span><div>...</div></span>';
$xml = $parser->parse($text);
$html = $renderer->render($xml);
die("$html\n"); <span><div>...</div></span> |
Beta Was this translation helpful? Give feedback.
-
Hi,
I subscribed to this repository a while ago because I think tools for interoperation/migration are very important. That's one of the main reasons I chose XML as the output format for s9e/TextFormatter, because every software has a way to read XML, and even if they don't want to use XML, the format is human readable and easy to understand.
Anyway, my main point is this. If you want to convert content from a software to another, you can use TextFormatter even if neither of them does. The library has a concept of "bundles." A bundle is a set or parser/renderer that can be configured for a given type of markup. You can create a bundle for almost any software you want to support. It will let you parse the plain text content from that software into an XML representation. Then you can then modify that XML with the appropriate tool. At that point, if your target software uses TextFormatter and you were able to produce the exact XML that your target software uses, you can save it as-is to the database, otherwise you should be able to turn it back to plain text and use your target software parse it according to their rules.
I made a similar offer in Flarum's forums, and if a motivated individual wants to create a migration for a specific software, I'm willing to write a bundle for it that I would maintain as part of the library.
Actually, I've been meaning to post this for a while now and while typing this I noticed you've already submitted a Vanilla bundle to the repository, which I really cool. Let me know if I can help with any part of it.
Beta Was this translation helpful? Give feedback.
All reactions