Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spam filter to work with flags from Rspamd and bogofilter #300

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion afew/filters/HeaderMatchingFilter.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,14 @@ def __init__(self, database, **kwargs):
def handle_message(self, message):
if self.header is not None and self.pattern is not None:
if not self._tag_blacklist.intersection(message.get_tags()):
value = message.get_header(self.header)
if not isinstance(self.header, list):
self.header = [self.header]

try:
value = next(filter(None, map(message.get_header, self.header)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty hard to read. I assume it'll call message.get_header() with every header configured in the filter, and filter out results that are an empty string (which can be the case if one of the headers doesn't exist)?

What's the next() in front of everything doing here? Why is value still a single element? What happens if the filter returns multiple elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you read it right.

value is a single element, the first matching header. This is because the HeaderMatchingFilter only matches a string, it only has a pattern to match a string. Up until now the filter only matches to one header. For spam, as there are many spam flagging tools, you get different headers depending on the tool, thus you need to be able to search for many headers.
I assume, sysadmins don't run their many spam filters at one, they just pick a tool. Thus I pick the first not empty match(that is what next does), which might be the only match. Later I just let the the same logic of the filter run its way and match the pattern to the string. There I use the benefits of the regex. Spamassasing and Rspamd Mark with YES, their spam, bogofilter uses Spam.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it makes sense in the scope of SpamFilter, this changes semantics in HeaderMatchingFilter in a non-intuitive and undocumented way.

Can we expose this in HeaderMatchingFilter in a less confusing way, and have it documented?

except StopIteration:
return

match = self.pattern.search(value)
if match:
sub = (lambda tag:
Expand Down
4 changes: 2 additions & 2 deletions afew/filters/SpamFilter.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@

class SpamFilter(HeaderMatchingFilter):
message = 'Tagging spam messages'
header = 'X-Spam-Flag'
pattern = 'YES'
header = ['X-Spam-Flag', 'X-Spam', 'X-Bogosity']
pattern = 'YES|Spam'

def __init__(self, database, tags='+spam', spam_tag=None, **kwargs):
if spam_tag is not None:
Expand Down