几个名词(robots.txt/POST/Phrase chunking)

来源:互联网 发布:kmp算法 编辑:程序博客网 时间:2024/04/30 11:39

导读:最近在走流程的时候遇到一些名词,之前并没有接触过,现在将一部分收集起来以便以后查阅。

1 robots.txt
robots.txt 
是一个纯文本文件,通过在这个文件中声明该网站中不想被robots访问的部分,这样,该网站的部分或全部内容就可以不被搜索引擎收录了,或者指定搜索引擎只收录指定的内容。当一个搜索机器人访问一个站点时,它会首先检查该站点根目录下是否存在robots.txt,如果找到,搜索机器人就会按照该文件中的内容来确定访问的范围,如果该文件不存在,那么搜索机器人就沿着链接抓取。

It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots.

The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don't try to use /robots.txt to hide information.

材料来源:http://www.robotstxt.org/robotstxt.html

2Part-of-speech tagging

Part-of-speech tagging (POS tagging or POST), also called grammatical tagging, is the process of marking up the words in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e., relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags.

 

材料来源:en.wikipedia.org/wiki/Part-of-speech_tagging

 

3Phrase chunking

Phrase chunking is a natural language process that separates and segments sentences into its subconstituents, i.e. noun, verb and prepositional phrases.

材料来源:en.wikipedia.org/wiki/Phrase_chunking

 
原创粉丝点击