This project and its related projects sound like a good idea, but really aren't.
Using the stripTags
function could be dangerous. From https://golang.org/pkg/html/template/#hdr-Security_Model:
This package assumes that template authors are trusted
stripTags
resides within html/template
and works according to those guaranties. Which might mean, that certain XSS attacks might go through undetected.
A fast, reliable and already battle-worn library to strip HTML tags is bluemonday.
They've got the bluemonday.StrictPolicy()
mode:
bluemonday.StrictPolicy()
is a mode which can be thought of as equivalent to stripping all HTML elements and their attributes as it has nothing on it's whitelist. An example usage scenario would be blog post titles where HTML tags are not expected at all and if they are then the elements and the content of the elements should be stripped. This is a very strict policy.
Example:
stripped := bluemonday.StrictPolicy().SanitizeBytes(`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`)
// Output: Google
That is exactly what you want when stripping arbitrary HTML content. A library, which understands XSS attacks and knows how to defuse these attacks. Even to the point of stripping all tags, leaving only plain text.
This Go package strips HTML tags from strings. No heavy lifting is done in this package. The unexported stripTags
fuction from html/template/html.go
is better suited for this task. All this package does is providing an exported function to access stripTags
.
- The
stripTags
function in html/template/html.go could be really useful, however, it is not exported. - Requests to export
stripTags
were made on Github without success. - Several attempts exist to un-unexport the function (1, 2, 3), but all solutions lacked an easy upgrade path for changes made from upstream.
- Most solutions take the content of all
html/template
files and put the content into one single file. - This solution does not modify the original
html/template
source files. Instead, it copies allhtml/template
files from go source into this package and adds oneexport.go
file, which adds aStripTags
function (see Versioning for the whole workflow).
- Strip HTML from html strings (you don't say 😄)
- Convert HTML emails to plain text
- Display HTML strings in cli app context
- Convert HTML content into plain text for RSS feeds
Import the library with
import "github.com/denisbrodbeck/striphtmltags"
package main
import (
"fmt"
"github.com/denisbrodbeck/striphtmltags"
)
func main() {
html := `<script>...</script> <b>¡Hi!</b>`
got := striphtmltags.StripTags(html)
fmt.Println(got)
// Output: ¡Hi!
}
This package follows the go release cycle.
On each new go release we:
- download the new go source files
- copy all files from
$GOSRC/src/html/template/
intohtml/template
- add one function
StripTags
which callsstripTags
- run all unit tests
- commit all changes
- create new tag matching go version (e.g. v1.9.2)
Build script:
#!/usr/bin/env bash
set -eru -o pipefail
# exit on error
# exit on uninitialized variables
# enter restricted shell https://www.gnu.org/s/bash/manual/html_node/The-Restricted-Shell.html
URL='https://redirector.gvt1.com/edgedl/go/go1.9.2.src.tar.gz'
curl -L --silent "$URL" -o "go.tar.gz"
tar -zxf "go.tar.gz"
rm -rf "html/template/*"
cp "go/LICENSE" "./"
cp "go/PATENTS" "./"
cp "go/VERSION" "./"
cp -a "go/src/html/template/" "html/template/"
cp "export.go.tpl" "html/template/export.go"
rm -f "go.tar.gz"
rm -rf "./go/"
This package uses the unexported stripTags
function from html/template
. That works for most normal use cases, when you want to completely strip HTML tags.
If you need to sanitize potentially unsafe user input, while preserving some valid html tags, consider using HTML sanitizer libraries such as Bluemonday.
The original go license. Please have a look at the LICENSE for more details.