Skip to content

html parser

Rafik Djedjig edited this page May 7, 2019 · 3 revisions

Html Parser

This section cover the latest version of he Html parsing tool. Other versions, like HtmlConverter, HtmlAdaptor and HtmlAdaptator are obsolete and shouldn't be used. They all will be removed in a near future.

The Html Parser is a tool used to render html content with the native elements of React Native libraries.

Html parser is build with an abstract core and two concrete implementations. While the abstract part describes the way html tags are parsed, the concrete parts describes how they will be printed on screen.

  • The text implementation renders raw text and provide some tools for extract parts (like a short extract).
  • The rn implementation renders React Native components.

Note : The Html Parser rely on a SAX library. It handle correct XML strings, not HTML. So you may get errors if you try to parse string that are not valid XHTML. Some woark has be done to make the Html Parser more permissive, but take care.

Note : The SAX library (saxophone) used loads all the html string before parse it. The length of Html string parsed depends on the device memory.

The Text implementation

For now the text implementation rely on the old HtmlConverter tool (the one before HtmlParser).

The module HtmlConverterText export a factory function that is used to convert Html to text. Simply instantiate it to get full text render or only an excerpt :

import HtmlToText from "../../infra/htmlConverter/text";
// pass the html string as the first parameter, and a boolean `ignoreLineBreaks` as the second parameter.
const render = HtmlToText(content, false)._render;
const excerpt = HtmlToText(content, false).excerpt;

The React Native implementation

This implementation can handle various of rich-content : formatted text, images, videos (iframes), and audio.

Start by instantiate your parser by giving options :

import HtmlParserRN, { IHtmlParserRNOptions } from "../infra/htmlParser/rn";
const htmlParser = new HtmlParserRN(opts);
const render = htmlParser.parse(html) as JSX.Element

You can put several options to the parser :

  • ignoreClass: string[] (default empty) Give an array containing html classnames. If a tag has a class that is present in this array, he and all its children will not be parsed.
  • preventZWSP: boolean (default true) Ignore parsing of all "Zero-Width SPace" character that are not visible but are not considered as regular spaces. Some ZWSP chars has been observed in some content generated with the ODE Framework's editor.
  • emptyDiv2Br: boolean (default true) Should empty
    tags interpreted as
    tags or simply be ignored.
  • parseEntities: boolean (default true) Does the parser have to replace escaped html entities to their regular character signification ? (Say yes)
  • fixVoidTags: boolean (default true) Set this option at true replaces the ugly html void tags (<br> <hr> ...) by their equivalent in xml syntax (<br/> <hr/> ...)
  • textFormating : boolean (default true) Parse basic text formatting (italic, bold, underline, text color and text background color)
  • hyperlinks: boolean (default true) Parse tags into links (also works with image-links)
  • images: boolean (default true) Parse tags. Images are grouped into a image grid when they are siblings. Custom ODE Framework emojis are parsed as inline images.
  • iframes: boolean (default true) Parse <iframe> tags (Youtube video, interactive content, etc...)
  • audio: boolean (default true) Parse tags.
  • globalTextStyle: TextStyle (default empty) Add style rules to any generated text elements
  • linkTextStyle: TextStyle (default empty) Add style rules to any generated text links elements
  • The HtmlContentView

    Use the parser is a great choice, but you can also do simpler by use the <HtmlContentView> component from /app/ui/HtmlContentView.tsx.

    You have to way to use it : giving it directly a html string, or giving it a url to load html string from the backend server.

    It accepts these props :

    • navigation: The React Navigator router instance
    • html: the html string to parse
    • source: a url to load the string (use it with getContentFromResource prop)
    • getContentFromResource: a function that return the html string from the distant server response
    • loadingComp: a custom component displayed until content is loaded (by default, a simple spinner is printed)
    • opts: all options you could give to the HtmlParserRN.

    If content couldn't be loaded or parsed correctly, the component will show an error message.

Clone this wiki locally