Robots.txt file is a very important file if you want to have a good ranking of your website on search engines, many websites don't offer this file. A Robots.txt file is helpful to keep out unwanted search engine spiders like email retrievers, image strippers, etc. It defines which paths are off limits for spiders to visit. This is useful if you want to hide some personal information or some secret files.
What is Robots.txt
Robots.txt file is a special text file that is always located in your web server's root directory. Robots.txt file contains restrictions for Web Spiders, telling them where they have permission to search. A Robots.txt is like defining rules for search engine spiders (robots) what to follow and what not to. It should be noted that Web Robots are not required to respect Robots.txt files, but most well written Web Spiders follow the rules you define.
Robots.txt file is a special text file that is always located in your web server's root directory. Robots.txt file contains restrictions for Web Spiders, telling them where they have permission to search. A Robots.txt is like defining rules for search engine spiders (robots) what to follow and what not to. It should be noted that Web Robots are not required to respect Robots.txt files, but most well written Web Spiders follow the rules you define.
How to Create Robots.txt
The format for the robots.txt file is special. It consists of records. Each record consists of two fields : a User-agent line and one or more Disallow: lines. The format is:
<Field> ":" <value>
The robots.txt file should be created in Unix line ender mode! Most good text editors will have a Unix mode or your FTP client *should* do the conversion for you. Do not
attempt to use an HTML editor that does not specifically have a text mode to create a robots.txt file.
User-agent
The User-agent line specifies the robot. For example:
User-agent: googlebot
You may also use the wildcard character "*" to specify all robots:
User-agent: *
You can find user agent names in your own logs by checking for requests to robots.txt. Most major search engines have short names for their spiders.
Disallow
The format for the robots.txt file is special. It consists of records. Each record consists of two fields : a User-agent line and one or more Disallow: lines. The format is:
<Field> ":" <value>
The robots.txt file should be created in Unix line ender mode! Most good text editors will have a Unix mode or your FTP client *should* do the conversion for you. Do not
attempt to use an HTML editor that does not specifically have a text mode to create a robots.txt file.
User-agent
The User-agent line specifies the robot. For example:
User-agent: googlebot
You may also use the wildcard character "*" to specify all robots:
User-agent: *
You can find user agent names in your own logs by checking for requests to robots.txt. Most major search engines have short names for their spiders.
Disallow
Examples of robot file
User-agent: googlebot
Disallow: /cgi-bin/
If you leave the Disallow line blank, it indicates that ALL files may be retrieved. At least one disallow line must be present for each User-agent directive to be correct. A completely empty Robots.txt file is the same as if it were not present
No comments:
Post a Comment