A PHP function that truncates (shortens) a given HTML5 string to a max number of characters.
Example: truncate after 6 characters including the ellipsis:
<p><b>A</b> red ball.</p>
=> <p><b>A</b> red…</p>
Compatible with PHP 5.6 and 7+
Uses the mbstring PHP extension for UTF-8.
More than 240 unit tests (see or run: unittest.php)
The function is in truncateHTML.php, you can just copy/paste it to your project.
- Quickly truncate most common HTML5 sources without using a full HTML parser (which is ~100x slower).
- Configurable ellipsis:
…
,...
,<a href="">More</a>
, etc.- Can include the length of the ellipsis in the truncated result.
- Supports self-closing tags like:
<img>
,<img/>
,<newtag />
- Collapsing spaces: sequences of multiple spaces are counted only once (including
<br>
,
and a few others) - Don't count characters in invisible elements like:
<head>
,<script>
,<noscript>
,<style>
,<!-- comments -->
- Supports HTML entities (
,…
,"
, etc.) - Whole word: can truncate at the end of the last word instead of cutting in the middle of a word.
- Cut long words: can truncate in the middle of a word if it is very long (useful to truncate an URL)
- Truncates before the error in case of malformed HTML (like a mismatched closing tag)
- UTF-8 support (multibyte characters)
// Example from the introduction:
truncateHTML(6, "<p><b>A</b> red ball.</p>");
// => "<p><b>A</b> red…</p>"
// Whole word:
truncateHTML(5, "<blockquote>A lumberjack</blockquote>");
// => "<blockquote>A…</blockquote>"
// Without whole word, without includeEllipsisLength:
truncateHTML(5, "<blockquote>A lumberjack</blockquote>", ['wholeWord' => false, 'includeEllipsisLength' => false]);
// => "<blockquote>A lum…</blockquote>"
// Whole word: example of cutting only long words:
truncateHTML( 5, "<a href='https://php.net/docs.php'>https://php.net/docs.php</a>");
// => "…" Notice how wholeWord truncates before opening a tag that would be left empty.
truncateHTML(20, "<a href='https://php.net/docs.php'>https://php.net/docs.php</a>");
// => "<a href='https://php.net/docs.php'>https://php.net/doc…</a>"
// Comments, scripts and styles are not counted:
truncateHTML(3, "<script>$();</script><!-- Start div --><div>Hi</div><!-- End div --> More text.");
// => "<script>$();</script><!-- Start div --><div>Hi…</div>"
// Collapsing multiple spaces:
truncateHTML(6, "A <br> \n\t long space!");
// => "A <br> \n\t long…"
// Tag mismatch: truncates before the error:
truncateHTML(99, "Click</a>here</a>");
// => "Click…"
string truncateHTML(int $maxLength, string $html, array $options = [])
-
$maxLength
: the returned HTML will contain at most $maxLength countable characters. If negative, remove $maxLength countable characters from the end of the $html. -
$html
: the input HTML string that will be truncated. -
$options
: (optional) an array of options:Options (with default value) Descriptions 'ellipsis'=>'…'
(or:'ellipsis'=>'...'
)The ellipsis that will be included. Can be an empty string, can contain HTML tags.
('…'
is the horizontal ellipsis character, ie.'...'
as a single unicode character)
(If not using UTF-8 mode, the default value will be'...'
instead of'…'
)'includeEllipsisLength'=>true
Whether to include the length of the ellipsis in the length of the truncated result. 'wholeWord'=>true
When truncating, don't cut in the middle of a word. Instead cut at the end of the last word. 'cutWord'=>18
When wholeWord
is enabled, allows to cut long words aftercutWord
characters (Set to0
orfalse
to disable)'utf8'=>true
Use UTF-8 mode. You should always use UTF-8 though.
Ifutf8
isfalse
, only ASCII-compatible single-byte encodings (such as Latin-1) are supported. For other encodings, use mb_convert_encoding to convert to UTF-8 and back.
(If UTF-8 is disabled, the default ellipsis will be'...'
instead of'…'
)
XHTML: probably works in most cases, but is untested.
Not supported:
- Malformed HTML, badly nested tags, missing closing tags: it doesn't try to guess the correct fix (for this you would need a full HTML parser).
Note: when meeting an unexpected closing tag: it always truncates before the closing tag (see the examples). - Uncommon HTML code like:
- HTML tags inside an HTML Tag attribute:
<img title="Hello<br>World">
- HTML tags inside an HTML Tag attribute:
- The string
</script>
inside<script>code…</script>
. For this you would need a full HTML parser, or a JavaScript parser. (Other tags are ok, but don't have a closing tag</script>
in a JavaScript string or comment) - The string
</style>
inside<style>code…</style>
. For this you would need a full HTML parser, or a CSS parser. (Other tags are ok, but don't have a closing tag</style>
in a CSS comment) - XML
- CDATA (deprecated in HTML5)
If you find more, please open an issue.
- v1.0.1 (9 Feb. 2018):
- Fix multibyte characters in regex
- Add parameter types verifications
- v1.0 (5 Feb. 2018):
- Initial version
- Inspired by: