Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't parse multipart/related followed by multipart/alternative #70

Open
tomjaguarpaw opened this issue Dec 13, 2021 · 13 comments
Open

Comments

@tomjaguarpaw
Copy link

tomjaguarpaw commented Dec 13, 2021

I am experiencing a problem with version 0.5. This problem was not present in 0.4.3.

I have several emails in my archive that have the following header:

Content-Type: multipart/related;
        boundary="<boundary>";
        type="multipart/alternative"

I the following parse error:

InvalidParameterValue "type" "multipart/alternative"

When parsing multipart/related we getRequiredParam "type" which succeeds and then parsed parseContentType which fails (https://github.com/purebred-mua/purebred-email/blob/v0.5/src/Data/MIME.hs#L928-L934). parseContentType explicitly looks for a boundary(https://github.com/purebred-mua/purebred-email/blob/v0.5/src/Data/MIME.hs#L593-L603) but there is none for multipart/alternative. The boundary here belongs to multipart/related.

I'm a bit confused at this point. I'm not familiar with the spec, but perhaps we should be looking for just something of form type="<...>/<...>" rather than something with another boundary? (If so then I'm surprised that I'm the first person to run into this.)

@tomjaguarpaw
Copy link
Author

@frasertweedale
Copy link
Member

It is a bug. Incidentally I was going to change this behaviour for the next release, but didn't quite get to it yet. In fact, this is the last thing I wanted to do before the next release (which also fixes the other issues you reported). But I got busy with other tasks.

Although I was aware the current behaviour deviates from the RFCs, I did not notice that it could fail in this way. I only thought it would accept some invalid data - not reject valid data. You have provided a nice test case :)

I shall endeavour to fix this and release v0.next as soon as possible. But I am so busy right now, it might not be until after Christmas. I apologise for the bug!

@tomjaguarpaw
Copy link
Author

Thanks for the update and no need to apologise. I am enjoying using this library! If there's anything that I can help with in the meantime then please let me know. If it would be helpful, and you can give me a hint of what to fix, then I can try to fix it myself an submit a PR.

@frasertweedale
Copy link
Member

Many thanks for the offer @tomjaguarpaw. And if I wasn't already halfway through the fix, I would take up your generous offer! But if you continue using this library, perhaps there is something you will contribute - besides your bug reports of course, which are themselves valuable contributions.

For the sake of transparency (and reducing bus factor, I suppose), I will sketch the problem and the approach I'm taking to fix it. RIght now we parse the "type" parameter as a full content-type, possibly with parameters, instead of just type and subtype as specified by RFC 2387. The fix is to introduce a new type for "type and subtype" - no parameters - and parse just that. The Related constructor must be updated to use the new type - an API breaking change. Then perhaps also refactor the existing ContentType data type and parser to reuse the new type and parser.

@frasertweedale
Copy link
Member

I have several emails in my archive that have the following header:

Content-Type: multipart/related;
        boundary="<boundary>";
        type="multipart/alternative"

Side note: how nice to find an RFC-compliant multipart/related production :D

@singpolyma
Copy link
Contributor

I just had to downgrade due to a related issue choking on this header: Content-Type: multipart/related;boundary=1_62BC61B6_29AF2480;Start="<smil>";Type="application/smil"

@frasertweedale
Copy link
Member

@singpolyma thanks for the report. I will investigate in the next couple weeks.

@frasertweedale
Copy link
Member

Never mind, I see the problem. <smil> is not a valid value for the start parameter. Per https://www.rfc-editor.org/rfc/rfc2387:

cid             := msg-id     ; c.f. [[822](https://www.rfc-editor.org/rfc/rfc2387#ref-822)]

and https://www.rfc-editor.org/rfc/rfc5322#section-3.6.4:

   msg-id          =   [CFWS] "<" id-left "@" id-right ">" [CFWS]

The value must contain an @, but does not.

@singpolyma Can you please provide more context? What program is producing this message, and how widespread are these non-conformant messages?

@singpolyma
Copy link
Contributor

singpolyma commented Jan 27, 2023 via email

@frasertweedale
Copy link
Member

@singpolyma thanks. I don't know anything about the implementation of MMS. Is there a normative specification somewhere?

Which vendor(s) produce the non-conforming messages? Have you raised ticket(s) with them?

Would it be possible to detect and "massage" the non-conforming messages into conformance before processing them with purebred-email?

@singpolyma
Copy link
Contributor

I could pre-massage but I'd basically need a header parser to do that with I guess.

I think the MMS messages are forwarded mostly-unmodified from various carriers, so convincing Verizon to change how they format their SMTP headers is probably not something I can do.

Using an older version of purebred is working for me for now.

@frasertweedale
Copy link
Member

@singpolyma do you think you could send me some (real world, complete) sample messages that exhibit this non-conformance? And I'll have a think about how to approach this problem.

@singpolyma
Copy link
Contributor

mms1.txt
This is an example message, with phone numbers scrubbed.

I've found another spec violation that comes from specifically Google Voice they send like this example:
mms2.txt

This Content-Type header does not have a "type" parameter at all.

It seems to me that both of these examples are syntactically valid, but violate various semantic requirements (required argument, syntax of particular argument in some contexts) and it would be nice if in these cases the message still parsed at all into parts, etc (which seems like it was more true on older versions of the library).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants