第三代搜索引擎lexxe

来源:互联网 发布:知和曰常 知常曰明 编辑:程序博客网 时间:2024/04/26 21:39
 最近在网上看到关于lexxe的介绍.
    澳大利亚华人、计算语言学博士乔鸿亮最近发明了第三代搜索引擎lexxe,其特征是“语言计算”。
    “语言计算”(linguisticcom-puting)的优势是具备人工智能,识别不同类型语句,进行语法分析,判断用户意图,从而提供直接有效的答案。
    在地址栏栏中直接输入www.lexxe.com不能访问,只留下一段话。由于访问人数增多,服务器只每天定时开放。在google里面搜索了一下,找到了lexxe里面的链接,还可以访问。地址是http://www.lexxe.com/main.cfm?sstring=Where+is+Nanjing+Road%3F&clickcluster=fmclk
&sstringtemp=fmstr
   想探究lexxe是怎么工作的,在上面找到了一段话翻译如下:
   对于短句which countries does Thailand share border with?
    1 lexxe进行句型转换,转化成陈叙句 Thailand shares border with which countries
    2 确定从陈叙句中去掉哪些词.which countries是疑问部分,包括一个疑问词和一个名词,所以把这两个单词去掉
    3 lexxe需要确定可能的答案会出现在哪边和多远.有时答案会出现在陈叙句型的一边,有时候在两边.答案距离句 型的距离可以配置.当这些 准备好后,lexxe执行查询获得头100个纪录.
    4 如果在这100个结果里面有句子具有"Thailand shares the border with"的结构,那获得句型右边n个词.
    5 对结果进行语言和统计处理,获得具有意义的短语(一组词,比如"Laos and Myanmar")
    6 将统计出来的有意义的词和短语进行计算来排序
    7 最后,最好的答案和接近最好的答案将被选作问题的答案
原文如下:
Q: Can you briefly illustrate how Lexxe's short question answering works?

A: Let's look at "which countries does Thailand share border with?" as an example. First of all, Lexxe carries out a sentential conversion of the query from question to statement type. That is to get a new sentence like "Thailand shares border with which countries." Secondly, it needs to decide what to remove from the new statement. In this case, "which countries" are the question part involving a question word and head noun, so the two words are removed. Thirdly Lexxe needs to know on which side and how far away the possible answers may occur. Sometimes the answer may occur on one side of the statement pattern and sometimes it occurs on both sides. The distance of the answer from the pattern will also be configured. When these are ready, Lexxe conducts search engine retrieval and obtains the first 100 results.

Fourthly, if a sentence in the top 100 results has the pattern "Thailand shares the border with" and say up to n words on the right hand side of the pattern will be retrieved. Fifthly, a linguistic and statistical processing will be carried out on the result to retrieve meaningful phrases (a group of words, such as "Laos and Myanmar"), if there are any. Sixly, statistically significant words and phrases will be calculated and ranked in a descending order of importance. Finally, the best answer and those that are very close in score to the best one will be selected as answer to the question.

 感觉说得还是不够清楚。在lexxe上还提到了另外一个第三代搜索引擎Brainboost.com。大家 也可以看看!

原创粉丝点击