Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's Finally Solve the Q2A Spam Issues! ♥ #793

Open
bhadaway opened this issue Jan 28, 2020 · 9 comments
Open

Let's Finally Solve the Q2A Spam Issues! ♥ #793

bhadaway opened this issue Jan 28, 2020 · 9 comments

Comments

@bhadaway
Copy link

Hi everyone,

I've been using Q2A for many years now, and the spam problem has never been solved, and all the half-baked add-ons that have been attempted to make it stop have barely helped, but are usually abandoned shortly after they're created, even if they were helpful.

Q2A offers a lot of moderation and anti-spam control, but it doesn't stop hundreds or even thousands of spam users from successfully registering and building up over and over, pretty much no matter what we try. I even had to come up with a workflow to deal with the problem because manually deleting these spam accounts is a nightmare:

https://bryanhadaway.com/how-to-delete-q2a-spam-users-in-bulk/

CAPTCHAs and similar methods just aren't working. The simplest, most elegant, most lightweight solution, has been low-hanging fruit for so long now. I'm talking about a honeypot! No more messy add-ons, just a few lines of code added to core, and we'll finally kill off 99.9% of spam.

Honeypot Code

HTML

Insert this additional input within any form you want to protect (mainly we're talking about the registration form):

<p class="url">URL:<br /><input type="url" name="url" value="https://example.com/" size="35" class="url" /></p>

CSS

Hide the input from normal users:

.url{display:none}

PHP

Wrap the PHP that's to be executed with this simple conditional:

if ( $_POST['url'] == 'https://example.com/' ) {
...form code to execute here...
}

The logic is very simple. Automated bots usually disable CSS and JS and just attack naked HTML forms. Either way, they sniff out the irresistible URL field and spam their link there. If this is changed from the default value in any way, the form doesn't execute.

I've used this method on all my projects for years, and while humans can still manually spam the form, automated bot spam has never gotten through.

In Q2A, I'm pretty sure this would need to be implemented in one or both of these files:

https://github.com/q2a/question2answer/blob/dev/qa-include/pages/register.php
https://github.com/q2a/question2answer/blob/dev/qa-include/qa-theme-base.php

But, the code is obviously a bit more complicated, so I'm not exactly sure how to implement the method in Q2A.

Thank you!

@svivian
Copy link
Collaborator

svivian commented May 10, 2020

Just to let you know, I have been testing some options for this. I first tried a honeypot on the Q2A site and it was pretty ineffective - caught about 5 bots over several days whereas hundreds of spam signups got through. I've analysed Apache logs in the past and the spam signups appeared to be actual people, loading the pages in a real browser. That's still the case now.

However, I have tested the same thing on another site and it worked much better there - over 100 registration attempts blocked (around 40%). So nowhere near 99% but still worth doing I think.

The code I added was pretty short - just a few lines in the register.php file. Unfortunately it's not really possible as a plugin - currently only one captcha plugin can be used at a time, so we can't have reCAPTCHA with honeypots as a secondary layer. And filter plugins don't currently handle full registrations, only usernames/emails.

No harm to having it in the core though, so I'll look at integrating this into v1.9. I'm also looking at other options, for example being able to mark posts as spam so they can be separated from regular moderation. Plus some bulk-delete tools.

@bhadaway
Copy link
Author

Awesome! This is great news.

I've analysed Apache logs in the past and the spam signups appeared to be actual people, loading the pages in a real browser. That's still the case now.

The rate at which new spam registrations are happening, this is unlikely. Most likely, they're just spoofing the user agent to appear as a normal browser visitor, but it's still just automated.

Unfortunately it's not really possible as a plugin - currently only one captcha plugin can be used at a time, so we can't have reCAPTCHA with honeypots as a secondary layer.

On one project, while it definitely was bit of a puzzle, I did successfully get the honeypot to work with reCAPTCHA, plus a few other layers, like killing the form submission if spam words were found. I'll email you the code if you think it might be possible to get it to work on Q2A.

Either way, I think this is a step in the right direction. We could always experiment with different honeypots too, or get creative and come up with something more unique to fighting Q2A spammers specifically, based on any unique patterns you've noticed over the years.

Obviously this is a WordPress plugin, but this general concept was pretty interesting to me as a solution that exponentially improves as time goes by and would greatly save on server resources:

https://wordpress.org/plugins/blackhole-bad-bots/

Also, not as a plugin, but a core feature enabled/disabled by a simple checkbox, a firewall would be pretty cool, and could help fight spam at the server level without needing to manipulate the form code:

https://perishablepress.com/7g-firewall/

Thanks Scott

@svivian
Copy link
Collaborator

svivian commented May 11, 2020

Most likely, they're just spoofing the user agent to appear as a normal browser visitor, but it's still just automated.

It's not just spoofing, it's loading the page as a browser would (including CSS, images etc). So it's definitely a real browser, but perhaps partly automated.

I did successfully get the honeypot to work with reCAPTCHA

Do you mean integrated as part of the same plugin? That would be possible, but it means every other captcha plugin needs to do the same as well. However we could potentially change up how captchas work to allow multiple captchas per form, and different captchas for registration vs posting.

plus a few other layers, like killing the form submission if spam words were found.

I have this in place too with a custom filter plugin. But they just changed the phrases they used, almost immediately (which further led me to believe it's human activity).

I'll email you the code if you think it might be possible to get it to work on Q2A.

That would be nice, it's worth looking into.

@bhadaway
Copy link
Author

Actually, there's a stand-alone PHP version of the Blackhole honeypot:

https://perishablepress.com/blackhole-bad-bots/

I still like my form honeypot and think it should stay, but this honeypot is better and more of a catch-all for all pages. Could be a nice additional layer. Scrap any of my other thoughts.

Anyway, did you add the honeypot in version 1.8.4, or is that not happening until 1.8.5?

Thanks

@svivian
Copy link
Collaborator

svivian commented May 13, 2020

No the honeypot is not in Q2A yet, I’ve only been testing it so far.

@Calinou
Copy link

Calinou commented Dec 11, 2020

Speaking of spam, is there a way to systematically reject posts if they contain a specific word? I'd rather not have to click the Reject button for those posts myself. Most spam posts I've dealt with on a Q&A platform I moderate use very specific keywords that legitimate users never write.

Also, is there a plugin or SQL query I can use to remove accounts with no posts that are older than 1 month? I could run something like that every day or week. This would be useful to clean up the database.

@Anton-V-K
Copy link

@Calinou , there is standard option in the Admin panel under Posting - Censored words - separate by spaces or commas - where you can list "dirty words" (it even allows to specify any letters with "*").

@Calinou
Copy link

Calinou commented Dec 12, 2020

@Anton-V-K I've seen that option, but do censored words automatically reject posts in the moderation queue?

@Calinou
Copy link

Calinou commented Apr 7, 2022

Bump 🙂

I've found over time that posts containing censored words are not automatically rejected in the moderation queue, and I haven't found a way to do so. This is needed for effective spam protection, so I don't get carpal tunnel from clicking Reject on all the obvious spam posts.

To further lighten the moderation load, there could also be a way to automatically approve posts that do not contain any URLs. In my experience, most (but not all) spam posts will contain at least one URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@Calinou @svivian @bhadaway @Anton-V-K and others