Welcome to Admin Junkies, Guest — join our community!

Register or log in to explore all our content and services for free on Admin Junkies.

General Don't want to give free content to the AI bots?

For all the diverse topics that don't quite fit elsewhere.

Tracy

Royal member
Joined
Jan 5, 2023
Messages
1,450
Credits
3,851
If you use CloudFlare, you can use the below ruleset in your WAF panel and block the major (currently) players.
These AI bot trainers really should allow an opt-in to their services instead of forcing it to be an opt-out. They are hoovering up tons of free data and turning around and making a profit on it for themselves. I really prefer not to support them. They want to use the data created on my site, they can do like one did with Reddit... pay for it.

Code:
(http.user_agent contains "claudebot") or (http.user_agent contains "CCBot") or (http.user_agent contains "ChatGPT-User") or (http.user_agent contains "GPTBot") or (http.user_agent contains "Omgili") or (http.user_agent contains "ImagesiftBot ") or (http.user_agent contains "cohere-ai") or (http.user_agent contains "anthropic-ai") or (http.user_agent contains "Google-Extended") or (http.user_agent contains "ByteSpider")

The diffbot one is the default for the package but can be changed by whomever is using it, so checking your logs for bot visits remains a good idea.

If you have shell access on your server and want to check our log out for certain terms quickly, grep is your friend. You do need access to read the logs and if you have that, it is as simple as
cat access.log | grep searchword where access.log is your HTTP server access log name and the searchword is the word you want to search for.

And example using the search word of bot from todays log of my astro site.

Screen Shot 2024-04-18 at 6.04.22 PM.png
 
Last edited:
People helping AI are like those people trained to help the replacement, or is it? Are they being trained to help the destructor of humanity (Terminator Movies)?
 
People helping AI are like those people trained to help the replacement, or is it? Are they being trained to help the destructor of humanity (Terminator Movies)?
That aspect I really don't care about.
What I do care about is them taking content created on my site and gaining financial benefit from it without even asking or offering compensation, especially considering that I pay for the pipeline they are stealing their data down from my site.
I have a feeling that soon there will be laws placed on the book dealing with that type of action exactly.
 
That aspect I really don't care about.
What I do care about is them taking content created on my site and gaining financial benefit from it without even asking or offering compensation, especially considering that I pay for the pipeline they are stealing their data down from my site.
I have a feeling that soon there will be laws placed on the book dealing with that type of action exactly.
Yeah, definitely it's a thing where people are exploited cause they can't read the fine print, like with the .com thing we were discussing.
 
Just to give you a hint of how heavy some of these hit your site. This sudden rise was right after I added ClaudeBot into the block list.
By far the majority of those blocks (over about a 7 hour timespan) were from that one AI bot.
Screen Shot 2024-04-19 at 1.37.21 AM.png
 
Is there any downside to giving them access though?
yeah, they usually use your content to create their response with no attribution to your site or way for a user to find your site for more than what they may have used it for.
Why would I want to give them information that is "hard earned" on many sites now so that they can benefit from it and the site it came from doesn't get squat?

Depending on how a couple of cases go I would not be surprised to see more and more sites being paywalled off, even if the paywall is limiting the full reading to signed in members, with guests only getting a small "taste".
Will there be people that run sites that don't care? Very likely. But there are others that feel that the AI bot scraping is infringing on their hard work.
 
Last edited:
yeah, they usually use your content to create their response with no attribution to your site or way for a user to find your site for more than what they may have used it for.
Why would I want to give them information that is "hard earned" on many sites now so that they can benefit from it and the site it came from doesn't get squat?

Depending on how a couple of cases go I would not be surprised to see more and more sites being paywalled off, even if the paywall is limiting the full reading to signed in members, with guests only getting a small "taste".
Will there be people that run sites that don't care? Very likely. But there are others that feel that the AI bot scraping is infringing on their hard work.
Everything is people trying to get your info which is something in the same boat. For instance, app deals giving you $1 burgers, even in this inflated economy, are designed to grab your info, which is incredibly valuable.
 
Just to give you a hint of how heavy some of these hit your site. This sudden rise was right after I added ClaudeBot into the block list.
By far the majority of those blocks (over about a 7 hour timespan) were from that one AI bot.
View attachment 3767
I’m at the point where I’ma block the claudebot as well. This is how high my bandwidth usage has been all day. It’s never been this high. There’s been at least 20-30 claudebot’s on my forum since 2pm.
 

Attachments

  • IMG_9536.png
    IMG_9536.png
    277.6 KB · Views: 0
I’m at the point where I’ma block the claudebot as well. This is how high my bandwidth usage has been all day. It’s never been this high. There’s been at least 20-30 claudebot’s on my forum since 2pm.
I don't know if claudebot honors it or not (but from reading it appears it does), but apparently they do try to read robots.txt... but they aren't even allowed to get to that point now. And there have been several of these attempts back to back also.

Screen Shot 2024-04-27 at 11.20.04 AM.png

An interesting read for those that were inquiring on why to block stuff like this.
The reddit one has some drift, but there are still some nuggets of knowledge in there.
 
Last edited:

Log in or register to unlock full forum benefits!

Log in or register to unlock full forum benefits!

Register

Register on Admin Junkies completely free.

Register now
Log in

If you have an account, please log in

Log in
Activity
So far there's no one here

Users who are viewing this thread

New Threads

Would You Rather #9

  • Start a forum in a popular but highly competitive niche

    Votes: 5 21.7%
  • Initiate a forum within a limited-known niche with zero competition

    Votes: 18 78.3%
Win this space by entering the Website of The Month Contest

Theme editor

Theme customizations

Graphic Backgrounds

Granite Backgrounds