Okapi BM25算法详解

来源:互联网 发布:java程序员面试题 编辑:程序博客网 时间:2024/05/21 13:54

 

   In information retrieval, Okapi BM25 is a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others.

 

    BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of the inter-relationship between the query terms within a document (e.g., their relative proximity). It is not a single function, but actually a whole family of scoring functions, with slightly different components and parameters. One of the most prominent instantiations of the function is as follows.

 

    BM25公式:

f564b024-7301-3aa2-a9a7-a3a1243a7017 

    IDF公式:

 

 

BM25公式: 
score(D,Q):就是我们所要计算的评分,即为[给定搜索内容]Q在[给定文档]D中的相关程度,分数越高表示相关度越高。 
q:[给定搜索内容]Q中的语素,英文的话就是单词,中文的话需要先进行简单的切词操作。 
f(qi,D):在[给定文档]D中,某一个语素qi出现的频率。 
|D|:[给定文档]D长度。 
avgdl:索引中所有文档长度。 
另外两个参数K1和b用来调整精准度,一般情况下我们取K1=2,b=0.75。

 

IDF公式:是用来计算公式1中IDF(qi)的值 
N:索引中文档的总数目。 
n(qi):索引中包含语素qi的文档的总书目。

 

 

 

原创粉丝点击