html: add option to set MaxBuf in Parse #214
Open
+7
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I encountered an issue when using html.Parse that triggers the following
call chain: html.Parse -> ParseWithOptions -> p.parse() -> p.tokenizer.Next()
-> readByte(). In the readByte() function, there's a logic block:
if z.maxBuf > 0 && z.raw.end-z.raw.start >= z.maxBuf {
z.err = ErrBufferExceeded
return 0
}
This logic only takes effect if maxBuf is set. However, when using html.Parse,
there is no way to use SetMaxBuf, nor is there any exported method to use
ParseWithOptions with SetMaxBuf. As a result, when parsing a very large HTML
document, such as this page: http://vod.culture.ihns.cas.cn, the memory usage
can increase significantly.
To solve this problem, I wrote a function using reflection:
func ParseOptionSetMaxBuf(maxBuf int) html.ParseOption {
funcValue := reflect.MakeFunc(
reflect.FuncOf([]reflect.Type{reflect.TypeOf((*html.ParseOption)(nil)).Elem().In(0)}, nil, false),
func(args []reflect.Value) (results []reflect.Value) {
parserValue := args[0].Elem()
}
And then used it as follows:
html.ParseWithOptions(bytes.NewReader(data), util.ParseOptionSetMaxBuf(len(data)*3))
Testing showed that setting maxBuf to at least 1.04 times the body length
ensures normal operation.
Therefore, would it be feasible to introduce a function similar to
ParseOptionEnableScripting that allows users to set MaxBuf?
Environment: