Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inbound email: download from S3 + convert HTML to plaintext #5348

Merged
merged 7 commits into from
Dec 18, 2024

Conversation

vadimkerr
Copy link
Member

@vadimkerr vadimkerr commented Dec 9, 2024

What this PR does

  • Make AmazonSESValidatedInboundWebhookView able to download emails from S3 by providing AWS credentials via env variables
  • Convert HTML to plaintext when there's only text/html available

Which issue(s) this PR closes

Related to https://github.com/grafana/oncall-private/issues/2905

Checklist

  • Unit, integration, and e2e (if applicable) tests updated
  • Documentation added (or pr:no public docs PR label added if not required)
  • Added the relevant release notes label (see labels prefixed w/ release:). These labels dictate how your PR will
    show up in the autogenerated release notes.

@vadimkerr vadimkerr added pr:no public docs Added to a PR that does not require public documentation updates release:ignore PR will not be added to release notes labels Dec 9, 2024
@vadimkerr vadimkerr changed the title Inbound email: convert HTML to plaintext Inbound email: download from S3 + convert HTML to plaintext Dec 17, 2024
@vadimkerr vadimkerr marked this pull request as ready for review December 18, 2024 14:14
@vadimkerr vadimkerr requested a review from a team as a code owner December 18, 2024 14:14
}

@staticmethod
def html_to_plaintext(html: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yikes, converting HTML to plaintext looks complicated! 😬

Copy link
Member Author

@vadimkerr vadimkerr Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the super quick review @joeyorlando ❤️
Yes converting HTML to plaintext is non-trivial (and technically not possible) because HTML is basically a "picture", not text. I added this mostly for backward compatibility so I tried to keep it similar to how Mailgun does it. Also see #5348 (comment)

Comment on lines +30 to +38
def __init__(self):
super().__init__(
session_params={
"aws_access_key_id": settings.INBOUND_EMAIL_AWS_ACCESS_KEY_ID,
"aws_secret_access_key": settings.INBOUND_EMAIL_AWS_SECRET_ACCESS_KEY,
"region_name": settings.INBOUND_EMAIL_AWS_REGION,
},
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding AWS creds so OnCall can download emails from S3

@@ -74,11 +85,10 @@ def dispatch(self, request):
if request.method.lower() == "head":
return HttpResponse(status=status.HTTP_200_OK)

integration_token = self.get_integration_token_from_request(request)
if integration_token is None:
if self.integration_token is None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little bit of refactoring: instead of having a self.get_integration_token_from_request(request) method, it's now just a property self.integration_token. no functional changes, just looks better IMO

Comment on lines +186 to +187
elif email.html:
message = cls.html_to_plaintext(email.html)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main change of the PR. If there's no plaintext representation for an email, we'll convert it from HTML to plaintext in the html_to_plaintext method below

Comment on lines +198 to +202
def html_to_plaintext(html: str) -> str:
"""
Converts HTML to plain text. Renders links as "text (href)" and removes any empty lines.
Converting HTML to plaintext is a non-trivial task, so this method may not work perfectly for all cases.
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are ESPs such as Mailgun that provide the same HTML -> plaintext functionality but there are also ESPs such as AWS SES that don't do it. In order to keep things relatively backward compatible when migrating from Mailgun to AWS SES, we want to convert HTML to plaintext in OnCall (and do it at least somewhat similarly to how Mailgun does it).

@vadimkerr vadimkerr added this pull request to the merge queue Dec 18, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 18, 2024
@vadimkerr vadimkerr added this pull request to the merge queue Dec 18, 2024
github-merge-queue bot pushed a commit that referenced this pull request Dec 18, 2024
# What this PR does

* Make `AmazonSESValidatedInboundWebhookView` able to download emails
from S3 by providing AWS credentials via env variables
* Convert HTML to plaintext when there's only `text/html` available

## Which issue(s) this PR closes

Related to grafana/oncall-private#2905

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
@vadimkerr vadimkerr removed this pull request from the merge queue due to a manual request Dec 18, 2024
@vadimkerr vadimkerr added this pull request to the merge queue Dec 18, 2024
Merged via the queue into dev with commit c36761e Dec 18, 2024
25 checks passed
@vadimkerr vadimkerr deleted the vadimkerr/iem-html branch December 18, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:no public docs Added to a PR that does not require public documentation updates release:ignore PR will not be added to release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants