How can I control which pages are indexed by the Search Engines
来源:互联网 发布:用win10引导ubuntu 编辑:程序博客网 时间:2024/06/05 15:44
By adding a robots.txt file to the root directory of your website, you can help control the indexing of your site by robots that ignore the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> convention.
The mystery of the robots.txt file revealed
Author: turtle
Control which of your pages are NOT indexed with a robots.txt file
You should add a robots.txt file to the root directory of all your websites to help control the indexing of your site by robots that ignore the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> convention. In this file you specifically list any pages that you DO NOT want walked and indexed (such as password protected folders and folders which contain only images, etc.). The robots.txt file is very simple yet very powerful and every website should have a robots.txt file on the root directory.
The Terminology
Create a new file with Notepad and call it robots.txt
The two conventions used in robots.txt file are User-agent: and Disallow: /
User-agent: * By using the * or wild card you are addressing ALL robots. If you wish to address individual robots you need to list each robot separately with an individual User-agent: statement. They must be listed by their specific name or IP Address, along with a separate Disallow: / statement listing the folders and files you DO NOT want the specified robot to index.
Tip: Use the * wild card to address all robots..... it is the safest way
Disallow: / List any folders that you do not want to have indexed by robots.
Warning: Disallow: / used without any folder name tells the robot do not index ANY page of the website.
ALL Files and folders in the directory named in the Disallow: / statement as well as all of those under it will NOT be indexed by robots.
Sample of Folders that could be in this website that we would not like the spiders to index with the search engines:
Disallow: /tutorials/meta/Disallow: /tutorials/images/Disallow: /tutorials/assets/Disallow: /tutorials/404redirect/
Example: Disallow: /tutorials/
Results: All files and sub folders located within the folder tutorials which includes all the folders listed in the above example as well as any other sub folders of the tutorials directory will not be indexed by the robots if you use this statement.
This would mean that the /meta, /images, /assets, /404redirect, AND any other folders as well as all of the files in those foldes will not be seen by indexing robots.
You may also list specific files that you do not want indexed in a robots.txt file.
Sample of Specific Files that could be in this website that we would not like the spiders to index with the search engines:
Disallow: /tutorials/meta_tags.htmlDisallow: /tutorials/custom_error_page.html
# Comments can be placed in a robots.txt file by starting the line with #
::back to top::
The Examples
Download a sample robots.txt or see below for an example.################################# sample robots.txt file for this website ## addresses all robots by using wild card *#User-agent: *# list folders robots are not allowed to indexDisallow: /tutorials/meta/Disallow: /tutorials/images/Disallow: /tutorials/assets/Disallow: /tutorials/404redirect/## list specific files robots are not allowed to index#Disallow: /tutorials/meta_tags.htmlDisallow: /tutorials/custom_error_page.html## End of robots.txt file################################
::back to top::
Related Tutorials
Introduction to Meta Tags
by turtle
URL: http://www.dwfaq.com/Miscellaneous/intro_to_metas.asp
Related Reference and Resources
You can read more about spiders (a.k.a. robots), META tags and what they do, as well as search engine optimization at the following URLs:
J.K. Bowman's Spider Food
URL: http://spider-food.net/handling-robots-b.html
Search Engine World
URL: http://www.searchengineworld.com/robots/robots_tutorial.htm
Search Engine Guide
URL: http://www.searchengineguide.com/1stsearchranking/2001/robots.html
Search Tools
URL: http://www.searchtools.com/robots/robots-txt.html
ZDNet
URL: http://www.zdnet.com/devhead/stories/articles/0,4413,1600632,00.html
- How can I control which pages are indexed by the Search Engines
- how websites are perceived by their visitors and the basic ways in which websites can be constructed.
- how websites are perceived by their visitors and the basic ways in which websites can be constructed.
- how websites are perceived by their visitors and the basic ways in which websites can be constructed.
- how websites are perceived by their visitors and the basic ways in which websites can be constructed.
- how websites are perceived by their visitors and the basic ways in which websites can be constructed.
- Can I use a Frameset design and still attract attention from the search engines?
- How Do Search Engines Work?
- How can I disable the User Account Control (UAC) feature on my Windows Vista computer?
- 6 methods to control what and how your content appears in search engines
- How can I get the logical valume by the datafile names and ASM disks?
- How can I find which tables reference a given table
- Here Are 8 Of The Top Open Source Full-Text Search Engines!
- What are the open source code search engines you know about
- How can i change the title?
- The Long Article Which I Can Show You
- How can I search through webkit-dev mailling listarchive?
- How can I send an email by Java application?
- 在Linux上安装ImageMagick和JMagick
- Hibernate3 API 下载!
- Python 就是最好的计算器
- JPEG2000的Kakadu源代码浅析之四:码流解码(二)
- C#利用Web Service实现短信发送
- How can I control which pages are indexed by the Search Engines
- 20060427个人日志[MS SQL 2000 msdb库(置疑)解决处理]
- Google工程师解释Googlebot抓取网页的原理
- Google Earth上面的图片是怎样来的?
- 1.1 介绍(Introduction)
- 八旬老翁收破烂抚养六名弃儿
- web中上传附件的安全问题
- WINCE实现直接写屏(一)
- 选择好适合自已的数据绑定方法!