Parse HTML character references.
- What is this?
- When should I use this?
- Install
- Use
- API
- Types
- Compatibility
- Security
- Related
- Contribute
- License
This is a small and powerful decoder of HTML character references (often called entities).
You can use this for spec-compliant decoding of character references. Itβs small and fast enough to do that well. You can also use this when making a linter, because there are different warnings emitted with reasons for why and positional info on where they happened.
This package is ESM only. In Node.js (version 14.14+, 16.0+), install with npm:
npm install parse-entitiesIn Deno with esm.sh:
import {parseEntities} from 'https://esm.sh/parse-entities@3'In browsers with esm.sh:
<script type="module">
import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
</script>import {parseEntities} from 'parse-entities'
console.log(parseEntities('alpha & bravo')))
// => alpha & bravo
console.log(parseEntities('charlie ©cat; delta'))
// => charlie Β©cat; delta
console.log(parseEntities('echo © foxtrot ≠ golf 𝌆 hotel'))
// => echo Β© foxtrot β golf π hotelThis package exports the identifier parseEntities.
There is no default export.
Parse HTML character references.
Configuration (optional).
Additional character to accept (string?, default: '').
This allows other characters, without error, when following an ampersand.
Whether to parse value as an attribute value (boolean?, default: false).
This results in slightly different behavior.
Whether to allow nonterminated references (boolean, default: true).
For example, ©cat for Β©cat.
This behavior is compliant to the spec but can lead to unexpected results.
Starting position of value (Position or Point, optional).
Useful when dealing with values nested in some sort of syntax tree.
The default is:
{line: 1, column: 1, offset: 0}Error handler (Function?).
Text handler (Function?).
Reference handler (Function?).
Context used when calling warning ('*', optional).
Context used when calling text ('*', optional).
Context used when calling reference ('*', optional)
string β decoded value.
Error handler.
this(*) β refers towarningContextwhen given toparseEntitiesreason(string) β human readable reason for emitting a parse errorpoint(Point) β place where the error occurredcode(number) β machine readable code the error
The following codes are used:
| Code | Example | Note |
|---|---|---|
1 |
foo & bar |
Missing semicolon (named) |
2 |
foo { bar |
Missing semicolon (numeric) |
3 |
Foo &bar baz |
Empty (named) |
4 |
Foo &# |
Empty (numeric) |
5 |
Foo &bar; baz |
Unknown (named) |
6 |
Foo € baz |
Disallowed reference |
7 |
Foo � baz |
Prohibited: outside permissible unicode range |
Text handler.
this(*) β refers totextContextwhen given toparseEntitiesvalue(string) β string of contentposition(Position) β place wherevaluestarts and ends
Character reference handler.
this(*) β refers toreferenceContextwhen given toparseEntitiesvalue(string) β decoded character referenceposition(Position) β place wheresourcestarts and endssource(string) β raw source of character reference
This package is fully typed with TypeScript.
It exports the additional types Options, WarningHandler,
ReferenceHandler, and TextHandler.
This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 14.14+ and 16.0+. It also works in Deno and modern browsers.
This package is safe: it matches the HTML spec to parse character references.
wooorm/stringify-entitiesβ encode HTML character referenceswooorm/character-entitiesβ info on character referenceswooorm/character-entities-html4β info on HTML4 character referenceswooorm/character-entities-legacyβ info on legacy character referenceswooorm/character-reference-invalidβ info on invalid numeric character references
Yes please! See How to Contribute to Open Source.
MIT Β© Titus Wormer