Skip to content

Commit

Permalink
Rework html-entities, new API and performance improvements.
Browse files Browse the repository at this point in the history
  • Loading branch information
mdevils committed Dec 28, 2020
1 parent c70a01c commit 78b504c
Show file tree
Hide file tree
Showing 22 changed files with 9,698 additions and 286 deletions.
1 change: 1 addition & 0 deletions .eslintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
src/named-references.ts
59 changes: 59 additions & 0 deletions .eslintrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
{
"extends": [
"eslint:recommended",
"plugin:@typescript-eslint/recommended",
"prettier/@typescript-eslint",
"plugin:prettier/recommended"
],
"plugins": ["@typescript-eslint", "prettier", "import"],
"env": {
"mocha": true
},
"rules": {
"@typescript-eslint/explicit-function-return-type": "off",
"@typescript-eslint/explicit-module-boundary-types": "off",
"@typescript-eslint/no-use-before-define": [
"error",
{
"functions": false,
"classes": false
}
],
"@typescript-eslint/consistent-type-definitions": ["error", "interface"],
"@typescript-eslint/no-unused-vars": [
"error",
{
"vars": "all",
"args": "after-used",
"ignoreRestSiblings": true,
"argsIgnorePattern": "^_"
}
],
"@typescript-eslint/no-explicit-any": "off",
"@typescript-eslint/no-empty-interface": "error",
"@typescript-eslint/no-non-null-assertion": "off",

"prettier/prettier": ["error", {"singleQuote": true}],
"no-irregular-whitespace": "off",
"no-control-regex": "off",
"no-duplicate-imports": ["error", {"includeExports": true}],
"arrow-body-style": ["error", "as-needed"],
"no-restricted-globals": ["error", "name", "toString", "pending"],
"import/order": ["error", {
"groups": ["builtin", "external", "internal", "sibling"],
"pathGroupsExcludedImportTypes": ["parent", "sibling", "index"],
"alphabetize": {
"order": "asc",
"caseInsensitive": true
}
}]
},
"settings": {
"import/resolver": {
"node": {
"extensions": [".ts"]
}
}
},
"parser": "@typescript-eslint/parser"
}
5 changes: 0 additions & 5 deletions .npmignore

This file was deleted.

27 changes: 27 additions & 0 deletions .prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"arrowParens": "always",
"bracketSpacing": false,
"endOfLine": "lf",
"jsxBracketSameLine": true,
"jsxSingleQuote": true,
"parser": "typescript",
"printWidth": 120,
"proseWrap": "preserve",
"semi": true,
"singleQuote": true,
"tabWidth": 4,
"trailingComma": "none",

"overrides": [
{
"files": ["*.json", ".eslintrc", ".prettierrc"],
"options": {
"parser": "json",
"singleQuote": false,
"trailingComma": "none",
"useTabs": false,
"tabWidth": 2
}
}
]
}
132 changes: 103 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ html-entities
[![Build Status](https://travis-ci.org/mdevils/node-html-entities.svg?branch=master)](https://travis-ci.org/mdevils/node-html-entities)
[![Coverage Status](https://coveralls.io/repos/mdevils/node-html-entities/badge.svg?branch=master&service=github)](https://coveralls.io/github/mdevils/node-html-entities?branch=master)

Fast html entities library.
Fastest html entities library.


Installation
Expand All @@ -17,47 +17,121 @@ $ npm install html-entities
Usage
-----

**XML entities**
### encode(text, options)

HTML validity and XSS attack prevention you can achieve from XmlEntities class.
Encodes text replacing HTML special characters (`<>&"''`) plus other character ranges depending on `mode` option value.

```javascript
const Entities = require('html-entities').XmlEntities;
```
import {encode} from 'html-entities';
encode('<>"\'&©∆')
// -> '&lt;&gt;&quot;&apos;&amp;©∆'
const entities = new Entities();
encode('<>"\'&©∆', {mode: 'nonAsciiPrintable'})
// -> '&lt;&gt;&quot;&apos;&amp;&copy;&#8710;'
console.log(entities.encode('<>"\'&©®')); // &lt;&gt;&quot;&apos;&amp;©®
console.log(entities.encodeNonUTF('<>"\'&©®')); // &lt;&gt;&quot;&apos;&amp;&#169;&#174;
console.log(entities.encodeNonASCII('<>"\'&©®')); // <>"\'&©®
console.log(entities.decode('&lt;&gt;&quot;&apos;&amp;&copy;&reg;&#8710;')); // <>"'&&copy;&reg;∆
encode('<>"\'&©∆', {mode: 'nonAsciiPrintable', level: 'xml'})
// -> '&lt;&gt;&quot;&apos;&amp;&#169;&#8710;'
```

**All HTML entities encoding/decoding**
Options:

#### level

* `all` alias to `html5` (default).
* `html5` uses `HTML5` named references.
* `html4` uses `HTML4` named references.
* `xml` uses `XML` named references.

#### mode

* `specialChars` encodes only HTML special characters (default).
* `nonAscii` encodes HTML special characters and everything outside of the [ASCII character range](https://en.wikipedia.org/wiki/ASCII).
* `nonAsciiPrintable` encodes HTML special characters and everything outiside of the [ASCII printable characters](https://en.wikipedia.org/wiki/ASCII#Printable_characters).

```javascript
const Entities = require('html-entities').AllHtmlEntities;
#### numeric

const entities = new Entities();
* `decimal` uses decimal numbers when encoding html entities. i.e. `&#169;` (default).
* `hexadecimal` uses hexadecimal numbers when encoding html entities. i.e. `&#xa9;`.


### decode(text, options)

Decodes text replacing entities to characters. Unknown entities are left as is.

console.log(entities.encode('<>"&©®∆')); // &lt;&gt;&quot;&amp;&copy;&reg;∆
console.log(entities.encodeNonUTF('<>"&©®∆')); // &lt;&gt;&quot;&amp;&copy;&reg;&#8710;
console.log(entities.encodeNonASCII('<>"&©®∆')); // <>"&©®&#8710;
console.log(entities.decode('&lt;&gt;&quot;&amp;&copy;&reg;')); // <>"&©®
```
import {decode} from 'html-entities';
**Available classes**
decode('&lt;&gt;&quot;&apos;&amp;&#169;&#8710;')
// -> '<>"\'&©∆'
```javascript
const XmlEntities = require('html-entities').XmlEntities, // <>"'& + &#...; decoding
Html4Entities = require('html-entities').Html4Entities, // HTML4 entities.
Html5Entities = require('html-entities').Html5Entities, // HTML5 entities.
AllHtmlEntities = require('html-entities').AllHtmlEntities; // Synonym for HTML5 entities.
decode('&copy', {level: 'html5'})
// -> '©'
decode('&copy', {level: 'xml'})
// -> '&copy;'
```

Options:

#### level

* `all` alias to `html5` (default).
* `html5` uses `HTML5` named references.
* `html4` uses `HTML4` named references.
* `xml` uses `XML` named references.

#### scope

* 'body' emulates behavior of browser when parsing tag bodies: entities without semicolon are also replaced (default).
* 'attribute' emulates behavior of browser when parsing tag attributes: entities without semicolon are replaced when not followed by equality sign `=`.
* 'strict' ignores entities without semicolon.

Performance
-----------

Statistically significant comparison with other libraries using `benchmark.js`.
Results by this library are marked with `*`.
The source code of the benchmark is available at `benchmark/benchmark.ts`.

```
HTML
Encode test
* #1: html-entities.encode - html5, specialChars x 1,182,489 ops/sec ±0.65% (93 runs sampled)
* #2: html-entities.encode - html5, nonAscii x 442,639 ops/sec ±1.49% (90 runs sampled)
* #3: html-entities.encode - html5, nonAsciiPrintable x 426,967 ops/sec ±1.07% (92 runs sampled)
#4: entities.encodeHTML5 x 127,785 ops/sec ±1.16% (94 runs sampled)
#5: he.encode x 113,690 ops/sec ±1.11% (89 runs sampled)
Decode test
* #1: html-entities.decode - html5, body x 358,440 ops/sec ±0.74% (89 runs sampled)
* #2: html-entities.decode - html5, attribute x 354,841 ops/sec ±1.54% (91 runs sampled)
* #3: html-entities.decode - html5, strict x 346,212 ops/sec ±1.79% (89 runs sampled)
#4: entities.decodeHTML4 x 288,765 ops/sec ±0.75% (96 runs sampled)
#5: entities.decodeHTML5 x 283,268 ops/sec ±0.96% (87 runs sampled)
#6: he.decode x 212,620 ops/sec ±2.63% (93 runs sampled)
XML
Encode test
* #1: html-entities.encode - xml, specialChars x 1,123,722 ops/sec ±1.79% (93 runs sampled)
#2: he.escape x 1,139,774 ops/sec ±3.95% (85 runs sampled)
* #3: html-entities.encode - xml, nonAscii x 434,552 ops/sec ±0.68% (95 runs sampled)
* #4: html-entities.encode - xml, nonAsciiPrintable x 409,857 ops/sec ±0.52% (93 runs sampled)
#5: entities.encodeXML x 292,893 ops/sec ±2.19% (93 runs sampled)
#6: entities.escape x 225,854 ops/sec ±2.37% (94 runs sampled)
Decode test
* #1: html-entities.decode - xml, body x 404,036 ops/sec ±0.45% (94 runs sampled)
* #2: html-entities.decode - xml, strict x 402,978 ops/sec ±0.53% (94 runs sampled)
#3: entities.decodeXMLStrict x 393,540 ops/sec ±3.02% (91 runs sampled)
* #4: html-entities.decode - xml, attribute x 389,117 ops/sec ±1.99% (91 runs sampled)
#5: entities.decodeXML x 389,969 ops/sec ±3.47% (87 runs sampled)
#6: he.unescape x 245,149 ops/sec ±1.28% (93 runs sampled)
```

Supports four methods for every class:
License
-------

* encode — encodes, replacing characters to its entity representations. Ignores UTF characters with no entity representation.
* encodeNonUTF — encodes, replacing characters to its entity representations. Inserts numeric entities for UTF characters.
* encodeNonASCII — encodes, replacing only non-ASCII characters to its numeric entity representations.
* decode — decodes, replacing entities to characters. Unknown entities are left as is.
MIT
Loading

3 comments on commit 78b504c

@ralphtheninja
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the breaking change when publishing v2.0.0? Since there's no changelog, it's quite difficult to know exactly what was changed, without reading this commit and understanding the complete module.

@mdevils
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralphtheninja
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch!

Please sign in to comment.