GPBTot: OpenAI introduces a web crawler to enhance ChatGPT

On Monday, July 7, OpenAI unveiled the GPTBox. This tracker will be used to browse websites in search of material to help the company’s artificial intelligence (AI) models evolve more effectively. On matters like accuracy and security, the feature can assist existing generative AIs to be improved.

While searching the internet for publically accessible material, GPTBox offers the capability to filter content that is only accessible through premium memberships. Additionally, it has the power to ban sources that collect personal information or transgress its rules.

Web pages crawled using the GPBTot user agent are filtered to eliminate sites known to collect personally identifiable information (PII), demand paywalls, or feature material that violates our policies, according to OpenAI.

How can I stop GPBTot?

Website owners can stop OpenAI’s web crawler from gathering information on their sites if they so choose. It is important to add a specific command to the site’s Robots.txt text file, which specifies what can and cannot be viewed, in order to prevent the tool from accessing the contents.

Simply add the following command in this situation:

Identifier: GPPTot

Refuse: /

Blocking access by the tracker’s IP is another option for people who don’t want to let GPBTot tracking on their pages. On the OpenAI website, there are instructions for this process.

Additionally, GPBTot access can be tailored just to permit tracking in selected areas of the website. To do this, add the GPBTot token as follows to Robots.txt:

GPBTot as the user agent

Permit: /directory-1/

3 Block: “/directory-2/”