Skip to content

JavaScript implementation of the Knuth-Plass linebreaking algorithm

Notifications You must be signed in to change notification settings

robertknight/tex-linebreak

Repository files navigation

tex-linebreak

npm version

tex-linebreak is a JavaScript library for laying out justified text as you would find in a newspaper, book or technical paper. It implements the Knuth-Plass line-breaking algorithm, as used by TeX.

Introduction

Most text on the web is presented with "ragged-right" margins, as opposed to the justified text you would find in eg. a scientific paper or newspaper. Text can be justified in web pages using text-align: justify. However this option alone tends to result in large   spaces    between words which is distracting to read. This is due to the use of "first fit" line-breaking algorithms where the browser considers only the current line when finding the next breakpoint. Some browsers support hyphenation via hyphens: auto which reduces this effect. However the first-fit approach can still produce wide lines and it can also produce more hyphenated lines than necessary.

The Knuth-Plass algorithm on the other hand optimizes the spacing between words over the whole paragraph, seeking to minimize the overall "badness" of the layout. This factor depends on the amount by which spaces have been shrunk or stretched and the number of hyphenated lines. The benefits of this approach are greater when rendering narrower columns of text (eg. on small screens).

This table compares the same text rendered in the same environment (font, font size, device width, margins) using CSS justification, CSS justification + hyphenation and this library:

Safari: text-align: justify Chrome: text-align: justify; hyphens: auto _tex-linebreak_
CSS justification produces large spaces on the second and penultimate lines. Enabling hyphenation using `hyphens: auto` in browsers that support it (as of 2018-04-07 this appears to be only Chrome) produces better output but still produces wide lines. The TeX algorithm in contrast hyphenates fewer lines and avoids excessive spacing between words.

tex-linebreak has no dependencies on a particular JS environment (browser, Node) or render target (<canvas>, HTML elements, PDF).

Try it out

The easiest way to see what the library can do is to install the bookmarklet and activate it on an existing web page, such as this Medium article.

It will justify and apply hyphenation to the content of any paragraph (<p>) elements on the page. The difference is more beneficial on smaller screens, so try in your browser's responsive design mode.

Note that the bookmarklet does not work on sites that use Content Security Policy to restrict where scripts can be loaded from.

Usage

First, add the tex-linebreak package to your dependencies:

npm install tex-linebreak

The library has low-level APIs which implement the core line-breaking and positioning algorithm, as well as higher-level APIs that provide a convenient way to justify existing HTML content.

Low-level APIs

The low-level APIs breakLines and positionItems work with generic "box" (typeset material), "glue" (spaces with flexible sizing) and "penalty" items. Typically "boxes" are words, "glue" items are spaces and "penalty" items represent hyphenation points or the end of a paragraph. However you can use them to lay out arbitrary content.

import { layoutItemsFromString, breakLines, positionItems } from 'tex-linebreak';

// Convert your text to a set of "box", "glue" and "penalty" items used by the
// line-breaking process.
//
// "Box" items are things (typically words) to typeset.
// "Glue" items are spaces that can stretch or shrink or be a breakpoint.
// "Penalty" items are possible breakpoints (hyphens, end of a paragraph etc.).
//
// `layoutItemsFromString` is a helper that takes a string and a function to
// measure the width of a piece of that string and returns a suitable set of
// items.
const measureText = text => text.length * 5;
const items = layoutItemsFromString(yourText, measureText);

// Find where to insert line-breaks in order to optimally lay out the text.
const lineWidth = 200;
const breakpoints = breakLines(items, lineWidth)

// Compute the (xOffset, line number) at which to draw each box item.
const positionedItems = positionItems(items, lineWidth, breakpoints);

positionedItems.forEach(pi => {
  const item = items[pi.item];

  // Add code to draw `item.text` at `(box.xOffset, box.line)` to whatever output
  // you want, eg. `<canvas>`, HTML elements with spacing created using CSS,
  // WebGL, ...
});

High-level APIs

The high-level APIs provide convenience methods for justifying content in existing HTML elements and laying out justified lines for rendering to HTML, canvas or other outputs. This includes support for hyphenation using the hypher library.

Justifying existing HTML content

The contents of an existing HTML element can be justified using the justifyContent function.

import enUsPatterns from 'hyphenation.en-us';
import { createHyphenator, justifyContent } from 'tex-linebreak';

const hyphenate = createHyphenator(enUsPatterns);
const paragraphs = Array.from(document.querySelectorAll('p'));
justifyContent(paragraphs, hyphenate);

After an element is justified, its layout will remain fixed until justifyContent is called again. In order to re-justify content in response to window size changes or other events, your code will need to listen for the appropriate events and re-invoke justifyContent.

Rendering text

For rendering justified text into a variety of targets (HTML, canvas, SVG, WebGL etc.), the layoutText helper can be used to lay out justifed text and obtain the positions which each word should be drawn at.

import { createHyphenator, layoutText } from 'tex-linebreak';

import enUsPatterns from 'hyphenation.en-us';

const hyphenate = createHyphenator(enUsPatterns);
const measure = word => word.length * 5;

const { items, positions } = layoutText(text, lineWidth, measure, hyphenate);

positions.forEach(pos => {
  // Draw text as in the above example for the low-level APIs
});

API reference

The source files in src/ have documentation in the form of TypeScript annotations.

Examples

For working code showing different ways to use this library, see the demos. You can build and run the demos using:

npm i -g http-server

git clone https://github.com/robertknight/tex-linebreak.git
cd tex-linebreak
yarn
yarn build-dev
http-server -c-1

Then navigate to http://127.0.0.1:8080/src/demos/layout.html (note that http-server may choose a different port).

Caveats

The library currently has a number of caveats:

  • It is not aware of floated content which can affect the available space in a paragraph to lay out text into. In the presence of floats lines can exceed the width of the paragraph.
  • Justification of existing HTML content relies on modifying the DOM to insert linebreaks and wrap text nodes in order to adjust inter-word spacing on each line. This can be in slow in large documents. Test it on your content to decide whether the overhead is acceptable for your use case. Also limit the number of elements which you apply justification to.

References

[1] D. E. Knuth and M. F. Plass, “Breaking paragraphs into lines,” Softw. Pract. Exp., vol. 11, no. 11, pp. 1119–1184, Nov. 1981.