Can't parse multipart/related followed by multipart/alternative #70

tomjaguarpaw · 2021-12-13T12:04:55Z

I am experiencing a problem with version 0.5. This problem was not present in 0.4.3.

I have several emails in my archive that have the following header:

Content-Type: multipart/related;
        boundary="<boundary>";
        type="multipart/alternative"

I the following parse error:

InvalidParameterValue "type" "multipart/alternative"

When parsing multipart/related we getRequiredParam "type" which succeeds and then parsed parseContentType which fails (https://github.com/purebred-mua/purebred-email/blob/v0.5/src/Data/MIME.hs#L928-L934). parseContentType explicitly looks for a boundary(https://github.com/purebred-mua/purebred-email/blob/v0.5/src/Data/MIME.hs#L593-L603) but there is none for multipart/alternative. The boundary here belongs to multipart/related.

I'm a bit confused at this point. I'm not familiar with the spec, but perhaps we should be looking for just something of form type="<...>/<...>" rather than something with another boundary? (If so then I'm surprised that I'm the first person to run into this.)

The text was updated successfully, but these errors were encountered:

tomjaguarpaw · 2021-12-13T12:14:26Z

This seems to be where the issue arose: 47a9e81#diff-4aeeafbfcb59f18a0a43c805646b96c69c0f72820b15ed2a24508621d2e0152dR684-R704

frasertweedale · 2021-12-13T12:21:20Z

It is a bug. Incidentally I was going to change this behaviour for the next release, but didn't quite get to it yet. In fact, this is the last thing I wanted to do before the next release (which also fixes the other issues you reported). But I got busy with other tasks.

Although I was aware the current behaviour deviates from the RFCs, I did not notice that it could fail in this way. I only thought it would accept some invalid data - not reject valid data. You have provided a nice test case :)

I shall endeavour to fix this and release v0.next as soon as possible. But I am so busy right now, it might not be until after Christmas. I apologise for the bug!

tomjaguarpaw · 2021-12-13T12:24:27Z

Thanks for the update and no need to apologise. I am enjoying using this library! If there's anything that I can help with in the meantime then please let me know. If it would be helpful, and you can give me a hint of what to fix, then I can try to fix it myself an submit a PR.

frasertweedale · 2021-12-13T12:45:08Z

Many thanks for the offer @tomjaguarpaw. And if I wasn't already halfway through the fix, I would take up your generous offer! But if you continue using this library, perhaps there is something you will contribute - besides your bug reports of course, which are themselves valuable contributions.

For the sake of transparency (and reducing bus factor, I suppose), I will sketch the problem and the approach I'm taking to fix it. RIght now we parse the "type" parameter as a full content-type, possibly with parameters, instead of just type and subtype as specified by RFC 2387. The fix is to introduce a new type for "type and subtype" - no parameters - and parse just that. The Related constructor must be updated to use the new type - an API breaking change. Then perhaps also refactor the existing ContentType data type and parser to reuse the new type and parser.

frasertweedale · 2021-12-13T12:53:16Z

I have several emails in my archive that have the following header:
Content-Type: multipart/related;
        boundary="<boundary>";
        type="multipart/alternative"

Side note: how nice to find an RFC-compliant multipart/related production :D

singpolyma · 2023-01-27T04:44:03Z

I just had to downgrade due to a related issue choking on this header: Content-Type: multipart/related;boundary=1_62BC61B6_29AF2480;Start="<smil>";Type="application/smil"

frasertweedale · 2023-01-27T07:20:46Z

@singpolyma thanks for the report. I will investigate in the next couple weeks.

frasertweedale · 2023-01-27T07:27:35Z

Never mind, I see the problem. <smil> is not a valid value for the start parameter. Per https://www.rfc-editor.org/rfc/rfc2387:

cid             := msg-id     ; c.f. [[822](https://www.rfc-editor.org/rfc/rfc2387#ref-822)]

and https://www.rfc-editor.org/rfc/rfc5322#section-3.6.4:

   msg-id          =   [CFWS] "<" id-left "@" id-right ">" [CFWS]

The value must contain an @, but does not.

@singpolyma Can you please provide more context? What program is producing this message, and how widespread are these non-conformant messages?

singpolyma · 2023-01-27T19:28:02Z

@singpolyma Can you please provide more context? What program is producing this message, and how widespread are these non-conformant messages?

I would say over half the MMS our system sees have a header with this param included and no @

frasertweedale · 2023-01-28T15:45:20Z

@singpolyma thanks. I don't know anything about the implementation of MMS. Is there a normative specification somewhere?

Which vendor(s) produce the non-conforming messages? Have you raised ticket(s) with them?

Would it be possible to detect and "massage" the non-conforming messages into conformance before processing them with purebred-email?

singpolyma · 2023-01-30T16:42:49Z

I could pre-massage but I'd basically need a header parser to do that with I guess.

I think the MMS messages are forwarded mostly-unmodified from various carriers, so convincing Verizon to change how they format their SMTP headers is probably not something I can do.

Using an older version of purebred is working for me for now.

frasertweedale · 2023-01-30T16:56:42Z

@singpolyma do you think you could send me some (real world, complete) sample messages that exhibit this non-conformance? And I'll have a think about how to approach this problem.

singpolyma · 2023-02-01T19:53:43Z

mms1.txt
This is an example message, with phone numbers scrubbed.

I've found another spec violation that comes from specifically Google Voice they send like this example:
mms2.txt

This Content-Type header does not have a "type" parameter at all.

It seems to me that both of these examples are syntactically valid, but violate various semantic requirements (required argument, syntax of particular argument in some contexts) and it would be nice if in these cases the message still parsed at all into parts, etc (which seems like it was more true on older versions of the library).

tomjaguarpaw mentioned this issue Dec 13, 2021

Add useful MIME optics? #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't parse multipart/related followed by multipart/alternative #70

Can't parse multipart/related followed by multipart/alternative #70

tomjaguarpaw commented Dec 13, 2021 •

edited

Loading

tomjaguarpaw commented Dec 13, 2021

frasertweedale commented Dec 13, 2021

tomjaguarpaw commented Dec 13, 2021

frasertweedale commented Dec 13, 2021

frasertweedale commented Dec 13, 2021

singpolyma commented Jan 27, 2023

frasertweedale commented Jan 27, 2023

frasertweedale commented Jan 27, 2023

singpolyma commented Jan 27, 2023 via email

frasertweedale commented Jan 28, 2023

singpolyma commented Jan 30, 2023

frasertweedale commented Jan 30, 2023

singpolyma commented Feb 1, 2023

Can't parse multipart/related followed by multipart/alternative #70

Can't parse multipart/related followed by multipart/alternative #70

Comments

tomjaguarpaw commented Dec 13, 2021 • edited Loading

tomjaguarpaw commented Dec 13, 2021

frasertweedale commented Dec 13, 2021

tomjaguarpaw commented Dec 13, 2021

frasertweedale commented Dec 13, 2021

frasertweedale commented Dec 13, 2021

singpolyma commented Jan 27, 2023

frasertweedale commented Jan 27, 2023

frasertweedale commented Jan 27, 2023

singpolyma commented Jan 27, 2023 via email

frasertweedale commented Jan 28, 2023

singpolyma commented Jan 30, 2023

frasertweedale commented Jan 30, 2023

singpolyma commented Feb 1, 2023

tomjaguarpaw commented Dec 13, 2021 •

edited

Loading