Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when calling tidyParseString with malformed input #1120

Open
gabe-sherman opened this issue Jul 30, 2024 · 1 comment
Open

Segfault when calling tidyParseString with malformed input #1120

gabe-sherman opened this issue Jul 30, 2024 · 1 comment

Comments

@gabe-sherman
Copy link

A segfault occurs in the below program when provided with malformed input. The segmentation fault occurs at line 625 in parser.c. This occurs when a node* type attempts to access its parent property, but the value is already NULL.

#include <stdio.h>
#include <stdarg.h>
#include <string.h>
#include <stdlib.h>
#include <tidy.h>

int main(int argc, char *argv[])
{
    FILE *f = fopen(argv[1], "rb");
    fseek(f, 0, SEEK_END);
    long size = ftell(f);
    rewind(f);

    char *v0 = (char*)malloc((size_t)size+1);
    fread(v0, (size_t)size, 1, f);
    v0[size] = '\0';

   TidyDoc tdoc = tidyCreate();
   tidyParseString(tdoc, v0);

   return 0;
}

Test Environment

Ubuntu 22.04.4, 64 bit

How to trigger

./filename POC

Version

Latest: d08ddc2

POC File

https://github.com/gabe-sherman/bug-pocs/blob/main/tidy-html5/c1

ASAN Report

=================================================================
==182978==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x5555558432f2 bp 0x7fffffffce90 sp 0x7fffffffce30 T0)
==182978==The signal is caused by a READ memory access.
==182978==Hint: address points to the zero page.
    #0 0x5555558432f2 in InsertDocType /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:625:32
    #1 0x55555584a3bb in prvTidyParseHead /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:2709:13
    #2 0x55555582833f in ParseHTMLWithNode /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:1077:25
    #3 0x55555587deaa in prvTidyParseDocument /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:6341:9
    #4 0x5555557dd3ef in prvTidyDocParseStream /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/tidylib.c:1509:9
    #5 0x5555557d5ab5 in tidyDocParseString /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/tidylib.c:1220:18
    #6 0x5555557d573c in tidyParseString /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/tidylib.c:1117:12
    #7 0x5555557cd4a1 in main /home/gabriel/fuzzing-trials/tidy-html/crashes/c1/reproducer.c:24:4
    #8 0x7ffff765fd8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #9 0x7ffff765fe3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #10 0x5555556f43d4 in _start (/home/gabriel/fuzzing-trials/tidy-html/crashes/c1/c1.out+0x1a03d4) (BuildId: 0f6509d2d013898defc26ab226c81186debc92c4)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/gabriel/fuzzing-trials/tidy-html/lib_asan/src/parser.c:625:32 in InsertDocType
==182978==ABORTING
make: *** [Makefile:30: crash] Error 1
@gabe-sherman
Copy link
Author

Update on this: I did a bit of digging to identify the root cause of this crash. The root of this comes from setting the value of an html node to the return type of the function call InferredTag in ParseDocument. The returned node from this InferredTag call has a NULL parent. I’ve seen this at lines 6316 and 6352. This value is then propagated into ParseHTMLWithNode, where it’s again propagated into its corresponding parser function. These parsers then pass this value into various functions where checks are not made to the parent values before they are accessed. I have seen this seg fault occur at line 625 in parser.c from ParseHead calling InsertDocType and at line 143 in parser.c from ParseInline calling InsertNodeAsParent. I don’t have enough knowledge of the API to recommend potential fixes, but I did notice that the function ParseNamespace avoids this seg fault through an assertion statement at line 4120 in parser.c. I hope this helps, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant