Google has quietly up to date its listing of user-triggered fetchers with new documentation for Google NotebookLM. The significance of this seemingly minor change is that it’s clear that Google NotebookLM is not going to obey robots.txt.
Google NotebookLM
NotebookLM is an AI analysis and writing instrument that permits customers so as to add an internet web page URL, which is able to course of the content material after which allow them to ask a variety of questions and generate summaries based mostly on the content material.
Google’s instrument can mechanically create an interactive thoughts map that organizes subjects from an internet site and extracts takeaways from it.
Consumer-Triggered Fetchers Ignore Robots.txt
Google Consumer-Triggered Fetchers are internet brokers which can be triggered by customers and by default ignore the robots.txt protocol.
Based on Google’s Consumer-Triggered Fetchers documentation:
“As a result of the fetch was requested by a consumer, these fetchers usually ignore robots.txt guidelines.”
Google-NotebookLM Ignores Robots.txt
The aim of robots.txt is to offer publishers management over bots that index internet pages. However brokers just like the Google-NotebookLM fetcher aren’t indexing internet content material, they’re performing on behalf of customers who’re interacting with the web site content material by means of Google’s NotebookLM.
How To Block NotebookLM
Google makes use of the Google-NotebookLM consumer agent when extracting web site content material. So, it’s potential for publishers wishing to dam customers from accessing their content material may create guidelines that mechanically block that consumer agent. For instance, a easy resolution for WordPress publishers is to make use of Wordfence to create a customized rule to dam all web site guests which can be utilizing the Google-NotebookLM consumer agent.
One other method to do it’s with .htaccess utilizing the next rule:
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} Google-NotebookLM [NC] RewriteRule .* - [F,L]