Skip to content

Commit

Permalink
Merge pull request #54 from bobmatyas/updates/082824
Browse files Browse the repository at this point in the history
Updates/082824
  • Loading branch information
bobmatyas authored Aug 29, 2024
2 parents 5e18b92 + 24f7bc9 commit 0b798c2
Show file tree
Hide file tree
Showing 4 changed files with 181 additions and 96 deletions.
59 changes: 57 additions & 2 deletions block-ai-crawlers.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
* Author: Bob Matyas
* Author URI: https://www.bobmatyas.com
* Text Domain: block-ai-crawlers
* Version: 1.3.8
* Version: 1.3.9
* License: GPL-2.0-or-later
* License URI: https://www.gnu.org/licenses/gpl-2.0.html
*
Expand All @@ -28,6 +28,7 @@
*/
function block_ai_robots_txt( $robots ) {
$robots .= "\n# Block AI Crawlers\n\n";
$robots .= "User-agent: AI2Bot\n";
$robots .= "User-agent: AmazonBot\n";
$robots .= "User-agent: Applebot-Extended\n";
$robots .= "User-agent: anthropic-ai\n";
Expand All @@ -44,11 +45,15 @@ function block_ai_robots_txt( $robots ) {
$robots .= "User-agent: ImagesiftBot\n";
$robots .= "User-agent: Meta-ExternalAgent\n";
$robots .= "User-agent: Meta-ExternalFetcher\n";
$robots .= "User-agent: OAI-SearchBot\n";
$robots .= "User-agent: Omgili\n";
$robots .= "User-agent: Omgilibot\n";
$robots .= "User-agent: PetalBot\n";
$robots .= "User-agent: PerplexityBot\n";
$robots .= "User-agent: Timpibot\n";
$robots .= "User-agent: Timpibot\n";
$robots .= "User-agent: YouBot\n";
$robots .= "User-agent: webzio\n";
$robots .= "User-agent: webzio-extended\n";
$robots .= "Disallow: /\n\n";
$robots .= "# End Block AI Crawlers\n";
return ( $robots );
Expand All @@ -74,3 +79,53 @@ function block_ai_activate() {
}

register_activation_hook( __FILE__, 'block_ai_activate' );

add_filter( 'plugin_action_links', 'block_ai_prepend_plugin_settings_link', 10, 2 );

/**
* Adds seettings link to plugins page
*
* @param array $links_array An array of the plugin's metadata.
* @param string $plugin_file_name Path to the plugin file.
* @return array $links_array
*/
function block_ai_prepend_plugin_settings_link( $links_array, $plugin_file_name ) {
if ( strpos( $plugin_file_name, basename( __FILE__ ) ) ) {
array_unshift( $links_array, '<a href=" ' . get_admin_url() . ' options-general.php?page=block-ai-crawlers">Settings</a>' );
}
return $links_array;
}


/**
* Adds ratings nudge to plugins page
*
* @access public
* @param array $links_array An array of the plugin's metadata.
* @param string $plugin_file_name Path to the plugin file.
* @return array $links_array
*/
function block_ai_append_plugin_rating( $links_array, $plugin_file_name ) {
if ( strpos( $plugin_file_name, basename( __FILE__ ) ) ) {

$links_array[] = "<a href='https://wordpress.org/support/plugin/block-ai-crawlers/reviews/#new-post' target='_blank' title='Rate 5 Stars'>
<i class='rate-stars'>"
. "<svg xmlns='http://www.w3.org/2000/svg' width='15' height='15' viewBox='0 0 24 24' fill='none' stroke='currentColor' stroke-width='2' stroke-linecap='round' stroke-linejoin='round' class='feather feather-star'><polygon points='12 2 15.09 8.26 22 9.27 17 14.14 18.18 21.02 12 17.77 5.82 21.02 7 14.14 2 9.27 8.91 8.26 12 2'/></svg>"
. "<svg xmlns='http://www.w3.org/2000/svg' width='15' height='15' viewBox='0 0 24 24' fill='none' stroke='currentColor' stroke-width='2' stroke-linecap='round' stroke-linejoin='round' class='feather feather-star'><polygon points='12 2 15.09 8.26 22 9.27 17 14.14 18.18 21.02 12 17.77 5.82 21.02 7 14.14 2 9.27 8.91 8.26 12 2'/></svg>"
. "<svg xmlns='http://www.w3.org/2000/svg' width='15' height='15' viewBox='0 0 24 24' fill='none' stroke='currentColor' stroke-width='2' stroke-linecap='round' stroke-linejoin='round' class='feather feather-star'><polygon points='12 2 15.09 8.26 22 9.27 17 14.14 18.18 21.02 12 17.77 5.82 21.02 7 14.14 2 9.27 8.91 8.26 12 2'/></svg>"
. "<svg xmlns='http://www.w3.org/2000/svg' width='15' height='15' viewBox='0 0 24 24' fill='none' stroke='currentColor' stroke-width='2' stroke-linecap='round' stroke-linejoin='round' class='feather feather-star'><polygon points='12 2 15.09 8.26 22 9.27 17 14.14 18.18 21.02 12 17.77 5.82 21.02 7 14.14 2 9.27 8.91 8.26 12 2'/></svg>"
. "<svg xmlns='http://www.w3.org/2000/svg' width='15' height='15' viewBox='0 0 24 24' fill='none' stroke='currentColor' stroke-width='2' stroke-linecap='round' stroke-linejoin='round' class='feather feather-star'><polygon points='12 2 15.09 8.26 22 9.27 17 14.14 18.18 21.02 12 17.77 5.82 21.02 7 14.14 2 9.27 8.91 8.26 12 2'/></svg>"
. '</i></a>';

$stars_color = '#ffb900';

echo '<style>'
. '.rate-stars{display:inline-block;color:' . $stars_color . ';position:relative;top:3px;}'
. '.rate-stars svg {fill:' . $stars_color . ';}'
. '</style>';
}

return $links_array;
}

add_filter( 'plugin_row_meta', 'block_ai_append_plugin_rating', 10, 4 );
10 changes: 8 additions & 2 deletions inc/css/admin-style.css
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@
color: rgb(75, 75, 75);
}

.block-ai-container div.block-ai-info table p {
font-size: 100%;
}

.block-ai-container .link {
font-size: 100%;
padding-top: 2px;
Expand All @@ -30,10 +34,8 @@
.block-ai-container div.block-ai-info table th,
.block-ai-container div.block-ai-info table p {
color: rgb(75, 75, 75);
font-size: 120%;
}
.block-ai-container div.block-ai-info table p {
font-size: 115%;
margin-top: 0;
}

Expand Down Expand Up @@ -73,3 +75,7 @@
border-bottom: 2px solid #eee;
color: #fff;
}

.block-ai-container .form-table td {
vertical-align: top;
}
193 changes: 104 additions & 89 deletions inc/settings-html.php
Original file line number Diff line number Diff line change
Expand Up @@ -20,95 +20,110 @@
<section>
<details>
<summary><h3>Blocked Crawlers</h3></summary>
<table class="form-table">
<tbody>
<tr>
<th>ChatGPT</th>
<td><p>Used by OpenAI</p></td>
<td><a href="https://platform.openai.com/docs/plugins/bot" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>GPTBot</th>
<td><p>Used by OpenAI to allow ChatGPT to access the web</p></td>
<td><a href="https://platform.openai.com/docs/gptbot" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Google Extended</th>
<td><p>Used by Google to power Gemini (formerly known as Bard)</p></td>
<td><a href="https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers?hl=en#common-crawlers" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>FacebookBot </th>
<td><p>Used by Meta (Facebook) for their AI</p></td>
<td><a href="https://developers.facebook.com/docs/sharing/bot" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>CommonCrawl</th>
<td><p>Compiles datasets used to train AI models</p></td>
<td><a href="https://commoncrawl.org/big-picture/frequently-asked-questions/" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>ClaudeBot and Claude-Web</th>
<td><p>Used by Anthropic's Claude</p></td>
<td><a href="https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Bytespider</th>
<td><p>Used by TikTok for AI training</p></td>
<td><a href="https://darkvisitors.com/agents/bytespider" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Omgilibot</th>
<td><p>Used by Omigili to scrape data for AI training</p></td>
<td><a href="https://webz.io/blog/machine-learning/common-crawl-vs-webz-io-data-which-one-works-best-for-large-language-models/" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Cohere</th>
<td><p>Used by Cohere to scrape data for AI training</p></td>
<td><a href="https://darkvisitors.com/agents/cohere-ai" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Diffbot</th>
<td><p>Used by Diffbot to scrape data for AI training</p></td>
<td><a href="https://docs.diffbot.com/reference/crawl-introduction" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>ImagesiftBot</th>
<td><p>Used by Hive's Imagesift tool that scrapes images. THis may be used for the company's generative AI product </p></td>
<td><a href="https://imagesift.com/about" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>PerplexityBot</th>
<td><p>Used by Perplexity for their AI products</p></td>
<td><a href="https://docs.perplexity.ai/docs/perplexitybot" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>AppleBot</th>
<td><p>Used by Apple for generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.</p></td>
<td><a href="https://support.apple.com/en-us/119829" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Meta-ExternalAgent / Meta-ExternalFetcher</th>
<td><p>Used by Meta to train AI products</p></td>
<td><a href="https://developers.facebook.com/docs/sharing/webmasters/web-crawlers" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>YouBot</th>
<td><p>Used by You.com to train AI products.</p></td>
<td><a href="https://about.you.com/es/youbot/" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>AmazonBot</th>
<td><p>Used by Amazon's Alexa AI to provide AI answers.</p></td>
<td><a href="https://developer.amazon.com/amazonbot" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Timpibot</th>
<td><p>Used by Timpi; likely for their Wilson AI Product.</p></td>
<td><a href="https://timpi.io/wilson-ai/" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
</tbody>
</table>
<table class="form-table">
<tbody>
<tr>
<th>AI2Bot</th>
<td><p>Explores sites for web content that is used to train open language models</p></td>
<td><a href="https://allenai.org/crawler" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>AmazonBot</th>
<td><p>Used by Amazon's Alexa AI to provide AI answers.</p></td>
<td><a href="https://developer.amazon.com/amazonbot" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>AppleBot</th>
<td><p>Used by Apple for generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.</p></td>
<td><a href="https://support.apple.com/en-us/119829" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Bytespider</th>
<td><p>Used by TikTok for AI training</p></td>
<td><a href="https://darkvisitors.com/agents/bytespider" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Cohere</th>
<td><p>Used by Cohere to scrape data for AI training</p></td>
<td><a href="https://darkvisitors.com/agents/cohere-ai" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>ChatGPT</th>
<td><p>Used by OpenAI</p></td>
<td><a href="https://platform.openai.com/docs/plugins/bot" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>ClaudeBot and Claude-Web</th>
<td><p>Used by Anthropic's Claude</p></td>
<td><a href="https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>CommonCrawl</th>
<td><p>Compiles datasets used to train AI models</p></td>
<td><a href="https://commoncrawl.org/big-picture/frequently-asked-questions/" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Diffbot</th>
<td><p>Used by Diffbot to scrape data for AI training</p></td>
<td><a href="https://docs.diffbot.com/reference/crawl-introduction" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>FacebookBot</th>
<td><p>Used by Meta (Facebook) for their AI</p></td>
<td><a href="https://developers.facebook.com/docs/sharing/bot" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Google Extended</th>
<td><p>Used by Google to power Gemini (formerly known as Bard)</p></td>
<td><a href="https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers?hl=en#common-crawlers" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>ImagesiftBot</th>
<td><p>Used by Hive's Imagesift tool that scrapes images. This may be used for the company's generative AI product </p></td>
<td><a href="https://imagesift.com/about" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Meta-ExternalAgent / Meta-ExternalFetcher</th>
<td><p>Used by Meta to train AI products</p></td>
<td><a href="https://developers.facebook.com/docs/sharing/webmasters/web-crawlers" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>OAI-SearchBot</th>
<td><p>Used by OpenAI for their SearchGPT product.</p></td>
<td><a href="https://platform.openai.com/docs/bots" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Omgilibot</th>
<td><p>Used by Omigili to scrape data for AI training</p></td>
<td><a href="https://webz.io/blog/machine-learning/common-crawl-vs-webz-io-data-which-one-works-best-for-large-language-models/" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>PerplexityBot</th>
<td><p>Used by Perplexity for their AI products</p></td>
<td><a href="https://docs.perplexity.ai/docs/perplexitybot" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Timpibot</th>
<td><p>Used by Timpi; likely for their Wilson AI Product.</p></td>
<td><a href="https://timpi.io/wilson-ai/" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Webzio</th>
<td><p>Used by Webz.io for their social listening and intelligence platforms.</p></td>
<td><a href="https://webz.io/bot.html" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>Webzio-Extended</th>
<td><p>Used by Webz.io for AI training.</p></td>
<td><a href="https://webz.io/bot.html" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
<tr>
<th>YouBot</th>
<td><p>Used by You.com to train AI products.</p></td>
<td><a href="https://about.you.com/es/youbot/" target=_blank>More Info <span class="dashicons dashicons-external link"></span></a></td>
</tr>
</tbody>
</table>
</details>
</section>

Expand Down
15 changes: 12 additions & 3 deletions readme.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
Contributors: lastsplash
Tags: ai, robots.txt, chatgpt, crawlers
Requires at least: 5.6
Tested up to: 6.5.3
Tested up to: 6.6
Requires PHP: 7.4
Stable tag: 1.3.8
Stable tag: 1.3.9
License: GPLv2 or later
License URI: https://www.gnu.org/licenses/gpl-2.0.html

Tell AI crawlers not to access your site to train their models.
Tells AI companies not to access and scrape your site for AI.

== Description ==

Expand Down Expand Up @@ -72,6 +72,15 @@ No. Search engines follow different `robots.txt` rules.

== Changelog ==

= 1.3.9 =
- New: Block PetalBot
- New: Block AI2Bot
- New: Block Webz.io
- New: Block OpenAI Search Bot (SearchGPT)
- Enhancement: Alphabetize list of blocked crawlers
- Enhancement: Indicate compatibility with WordPress v6.6
- Enhancement: Add quick link to settings and nudge for rating on plugins page

= 1.3.8 =
- Maintenance: Auto-deply from Github fixed / bumped version number

Expand Down

0 comments on commit 0b798c2

Please sign in to comment.