You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I went through the list of RegExps that isbot provides and removed everyone where we overlapped. It is worth noting that isbot used case-insensitive RegExps and we use case-sensitive here.
I've grouped them below.
Generic patterns:
I think we'll discard all of these because they are so generic that we wouldn't be able to identify a specific bot.
[ ]+bot
^[a-z.0-9/ \-_]*bot
^email
^java/
^javascript
^php
analyzer
archiver
bot($|[/\);-]+)
checker
cloudflare
crawler
extractor
fetcher
http[s]?://
monitoring
optimizer
robot
scraper
spider
transcoder
uptime
Not user agents:
I don't think these are actually user agents but need to double check.
google page speed insight
google search
Little to no overlap:
We don't seem to match any of these or they overlap with a different pattern.
^ad muncher
^avsdevicesdk/
^bidtellect/
^blogtrottr
^boardreader
^castro
^collectd
^comodo
^cortex
^ddg_android
^ez publish
^fdm[\s/]\d
^holmes
^lua-resty-http
^navermailapp
^netlyzer fastprobe
^netsurf
^newsgator
^octopus
^postmanruntime
^prittorrent
^rainmeter
^ramblermail
^server density
^sitesucker
^snapchat
^spotify/
^sprinklr
^the knowledge ai
^unityplayer
^websitepulse
^windows-rss
^wsr-agent
^yahoo:linkexpander
^yahoocachesystem
^zooshot
apachebench/
arachni
banca caboto
browsershots
catchpoint
curious george
datadog agent
daum(oa)?[ /][0-9]- there is an "instance" of this in "mediapartners-google" because it mimics
dmbrowser
duplexweb-google
gobuster
gomezagent
googleimageproxy
goose/
guzzlehttp
help@dataminr\.com
heritrix - we classify this as internet archive and may want to fine-tune
We should compare the list of user-agents we match vs isbot to see if we are missing any.
The text was updated successfully, but these errors were encountered: