Controlling search engines and web crawlers using the robots.txt file

Controlling Search Engines and Web Crawlers Using robots.txt

The robots.txt file allows you to control which parts of your site search engines and web crawlers can access. You place the file in your document root (public_html) and use directives to specify crawler behavior.

Important Note

Directives in a robots.txt file are requests, not enforceable rules. Most search engines respect them, but some crawlers may ignore them. Do not rely on robots.txt to hide sensitive content.

Common Directives

1. Allow All Crawlers to Access All Files

User-agent: *
Disallow:

User-agent: * applies to all crawlers.
Disallow: with no value allows access to all files.

2. Block All Crawlers from Accessing the Site

User-agent: *
Disallow: /

The / path blocks crawlers from all files on the site.

3. Block Crawlers from a Specific Directory

User-agent: *
Disallow: /scripts/

Crawlers are prevented from accessing the /scripts/ directory.

4. Block Crawlers from a Specific File

User-agent: *
Disallow: /documents/index.html

Crawlers cannot access the file /documents/index.html.

5. Control Crawl Interval

User-agent: *
Crawl-delay: 30

Crawlers are instructed to wait at least 30 seconds between requests.

For more information about the robots.txt file, visit http://www.robotstxt.org.

Controlling search engines and web crawlers using the robots.txt file

Клучни тагови

Поддршка

Controlling Search Engines and Web Crawlers Using robots.txt

Important Note

Common Directives

1. Allow All Crawlers to Access All Files

2. Block All Crawlers from Accessing the Site

3. Block Crawlers from a Specific Directory

4. Block Crawlers from a Specific File

5. Control Crawl Interval

Популарни прашања

Controlling search engines and web crawlers using the robots.txt file

Клучни тагови

Поддршка

Controlling Search Engines and Web Crawlers Using robots.txt

Important Note

Common Directives

1. Allow All Crawlers to Access All Files

2. Block All Crawlers from Accessing the Site

3. Block Crawlers from a Specific Directory

4. Block Crawlers from a Specific File

5. Control Crawl Interval

Популарни прашања

Генерирај лозинка