基于JSON Schema的HTML解析器

前言

写这篇文章主要是近年来前端服务端渲染框架层出不穷，开发的选择眼花缭乱。最近有个想法，把HTML的DOM树抽象成用JSON schema 语言描述，可以用于HTML渲染描述中间层抹平框架的差异。

注意：只是作为中间层抹平差异，不是代替框架 。更好地抽象处理HTML的DOM节点在前端渲染，或者服务端渲染。

如果觉得文章啰嗦，我已经实现了该Node模块 html-schema-parser，可以直接安装使用。
或点击 github 查看源码https://github.com/chenshenhai/html-schema-parser/

具体的渲染设想如下：

输入 JSON Schema

{
  tag: 'div',
  attribute: {
    style: 'background:#f0f0f0;'
  },
  content: [
    {
      tag: 'ul',
      content: [
        {
          tag: 'li',
          content: ['item 001']
        }, {
          tag: 'li',
          content: ['item 002']
        }
      ]
    }
  ],
}

输出HTML

<div style="background:#f0f0f0;">
    <ul>
        <li>item 001</li>
        <li>item 002</li>
        <li>item 003</li>
    </ul>
</div>

HTML的基本要素

如果要实现以上解析功能，就先抽象出前端、后端的HTML渲染的元素。

tag类型 tag
tag属性 attribute
tag内容（文本和子节点）content

Schema解析实现

设定tag

合法的tag

const legalTags = [
  'html', 'head', 'title', 'base', 'link', 'meta', 'style', 'script',
  'noscript', 'body', 'section', 'nav', 'article', 'aside', 'h1', 'h2',
  'h3', 'h4', 'h5', 'h6', 'hgroup', 'header', 'footer', 'address', 'main',
  'p', 'hr', 'pre', 'blockquote', 'ol', 'ul', 'li', 'dl', 'dt', 'dd',
  'figure', 'figcaption', 'div', 'a', 'em', 'strong', 'small', 's', 'cite',
  'q', 'dfn', 'abbr', 'data', 'time', 'code', 'var', 'samp', 'kbd', 'sub',
  'sup', 'i', 'b', 'u', 'mark', 'ruby', 'rt', 'rp', 'bdi', 'bdo', 'span', 'br',
  'wbr', 'ins', 'del', 'img', 'iframe', 'embed', 'object', 'param', 'video',
  'audio', 'source', 'track', 'canvas', 'map', 'area', 'svg', 'math',
  'table', 'caption', 'colgroup', 'col', 'tbody', 'thead', 'tfoot', 'tr',
  'td', 'th', 'form', 'fieldset', 'legend', 'label', 'input', 'button',
  'select', 'datalist', 'optgroup', 'option', 'textarea', 'keygen',
  'output', 'progress', 'meter', 'details', 'summary', 'command', 'menu'
]

不闭合的tag

const notClosingTags = {
  'area': true,
  'base': true,
  'br': true,
  'col': true,
  'hr': true,
  'img': true,
  'input': true,
  'link': true,
  'meta': true,
  'param': true,
  'embed': true
}

解析tag的schema

解析schema入口

function parseSchema (schema) {
    let html = ''
    
    if (isJSON(schema)) {
      // 如果是JSON类型，就直接用tag类型解析
      html = parseTag(schema)
    } else if (isArray(schema)) {
      // 如果是JSON类型，就直接用tag上下文类型解析
      html = parseContent(schema)
    }
    return html
}

解析tag类型

function parseTag (schema) {
  let html = ''
  if (isJSON(schema) !== true) {
    return html
  }
  let tag = schema.tag || 'div'
  const content = schema.content
  // 检查是否为合法的tag, 如果不是，默认为div
  if (tags.legalTags.indexOf(tag) < 0) {
    tag = 'div'
  }
  const attrStr = parseAttribute(schema.attribute)
  // 判断是否为闭合tag
  if (tags.notClosingTags[tag] === true) {
    // 如果是闭合的tag
    html = `<${tag} ${attrStr} />`
  } else {
    // 如果是非闭合tag
    html = `<${tag} ${attrStr} >${parseContent(content)}</${tag}>`
  }
  return html
}

解析tag内容

// 内容的类型为 Array
// Array的内容为字符串，或者tag类型的JSON
function parseContent (content) {
  let html = ''
  if (isArray(content) !== true) {
    return html
  }
  for (let i = 0; i < content.length; i++) {
    const item = content[i]
    if (isJSON(item) === true) {
      html += parseTag(item)
    } else if (isString(item)) {
      html += item
    }
  }
  return html
}

参考

https://github.com/caolan/pithy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04.md

04.md

基于JSON Schema的HTML解析器

前言

HTML的基本要素

Schema解析实现

设定tag

解析tag的schema

参考

Files

04.md

Latest commit

History

04.md

File metadata and controls

基于JSON Schema的HTML解析器

前言

HTML的基本要素

Schema解析实现

设定tag

解析tag的schema

参考