The HTML Highlighter is a JavaScript module that solves these problems:
- Display colorful highlights on words in a live Web page that are
determined by either or both of these sources:
- User
selections
identified by a user dragging their pointer over a portion of a page, possibly covering multiple tags in the DOM tree. - Machine
selections
identified by a program, which might run in the browser or in a server-side environment that processes the HTML and text of a page to decide which portions of content should be marked.
- User
- Provide these offsets to either JavaScript or backend tools. StreamCorpus Pipeline is being extended to provide translation between the relative offsets generated by HTML Highlighter and the absolute character offsets used by many backend text processing tools.
- Provide objects isomorphic to JavaScript's
Range
object, which has character offsets relative to DOM nodes identified by Xpaths:
{
start: {
xpath: <string> // unique address to DOM node
offset: <int> // relative character offset
},
end: {
xpath: <string> // unique address to DOM node
offset: <int> // relative character offset
}
}
The inline comments and class documentation are sufficient for a JavaScript programmer to jump in and start using this. To see an example, you can:
git clone git://github.com/dossier/html-highlighter
cd html-highlighter
$BROWSER examples/index.html