Controlling Search Engines and Web Crawlers Using robots.txt

The robots.txt file allows you to control which parts of your site search engines and web crawlers can access. You place the file in your document root (public_html) and use directives to specify crawler behavior.


Important Note

Directives in a robots.txt file are requests, not enforceable rules. Most search engines respect them, but some crawlers may ignore them. Do not rely on robots.txt to hide sensitive content.


Common Directives

1. Allow All Crawlers to Access All Files

User-agent: *
Disallow:
  • User-agent: * applies to all crawlers.

  • Disallow: with no value allows access to all files.

2. Block All Crawlers from Accessing the Site

User-agent: *
Disallow: /
  • The / path blocks crawlers from all files on the site.

3. Block Crawlers from a Specific Directory

User-agent: *
Disallow: /scripts/
  • Crawlers are prevented from accessing the /scripts/ directory.

4. Block Crawlers from a Specific File

User-agent: *
Disallow: /documents/index.html
  • Crawlers cannot access the file /documents/index.html.

5. Control Crawl Interval

User-agent: *
Crawl-delay: 30
  • Crawlers are instructed to wait at least 30 seconds between requests.


For more information about the robots.txt file, visit http://www.robotstxt.org.

?האם התשובה שקיבלתם הייתה מועילה 0 משתמשים שמצאו מאמר זה מועיל (0 הצבעות)

Powered by WHMCompleteSolution